BC3203

Independent Analysis Project

In this assignment you will use the skills you learned throughout the subject to analyse and interpret a next generation sequencing dataset. One of the key challenges of this assignment is that you will need to adapt what you learned previously to suit the provided data. Unlike these earlier assignments no step-by-step instructions are provided so you will need to make many decisions about what analysis to perform and figure out the right commands to make it work. This guide provides information on where to get data and how to structure your report and presentation.

Note that many of the projects included below rely on skills for analysing RNASeq data. Although you are not required to submit a report for the RNASeq assignment (that assignment is for BC3203) you should complete the guide to RNASeq. Completing this guide will help you learn for your final exam and will provide you with the skills you need to complete RNASeq related projects from the list below.

Setup

First ensure that you obtain your own copy of the student version of the assignment by accepting the invite link https://classroom.github.com/a/Jf2sainw

Choose ONE dataset from the following list.

Dataset Title Analysis Type Dataset Link and Instructions
Freshwater turtle microbiome 16S Metabarcoding Turtle Data
Mouse diets 16S Metabarcoding Mouse Data
Abalone heat stress RNASeq Abalone Data
Response to sars-cov-2 infection RNASeq COVID Data

Note that each of these datasets is associated with published work. Although you should certainly read the published work to obtain background information on the dataset you are not expected (nor is it always possible) to reproduce the published results. The published results often contain statistical analyses that are beyond the scope of this subject and would take far more time than you have available. Your analysis should focus on a few key techniques that you feel capable of performing and interpreting to a high standard.

You will find that there are many analyses that you could perform with this data. Each dataset comes with a set of suggested analyses specific to that data type. You don’t need to adhere strictly to the suggestions but if you deviate significantly you should discuss this with teaching staff before investing too much time.

Assessment Tasks

Assessment for this assignment is in three parts;

  1. A written report detailing the main findings
  2. Documentation of your code. This is called the “Reproducible analysis” section
  3. An oral presentation pitching a new experiment that would build on the findings of the data you have analysed.

Written Report

Maximum length is 3 pages including figures and text

The assignment template comes with a report file (report.Rmd). You should replace the template text in this report with your own work. Your report should adhere to the following structure.

All parts of this report (Introduction, Analysis steps, Discussion) should be clearly written, use appropriate terminology and demonstrate that you have a good understanding of the methods used.

This part of your assignment should be knitted to pdf and submitted via LearnJCU.

Reproducible Analysis

This part of your report will be quite different from those you have built with RMarkdown previously. In particular, you are encouraged to describe the steps in your analysis (and display corresponding code). This part of your assessment should be written in the file analysis.Rmd. Since you will need to delete the template text in that report you can always retrieve it [here](template text. This report represent a complete, reproducible record of all the steps you took to complete the analysis. It should include code, descriptive text (which reads like a methods section of a paper) and outputs (ie plots and/or tables). Pay consideration to the following;

Ensure that you have a clear structure and logical order. Each displayed code block should have a clear purpose (described by text) and if you have highly repetitive code sections you may choose to hide some code from display or restructure your code to make it less repetitive.

If you are including bash commands in this section you should make sure that these do not run when you compile your report. This can be done by using eval=FALSE on all your bash code blocks.

Although this section should describe your analysis in detail you might find that there are some very minor steps in the analysis that can be skipped. Make sure that you include;

As you develop your oral presentation you might try some analyses or plots and later decide that they are not useful. Do NOT include these. Keep your presentation and reproducible analysis focused on what was successful.

Use RMarkdown code blocks to show the code (bash or R) used to perform steps (with echo=TRUE) but be very careful to avoid showing extraneous error messages or unwanted outputs in your final report.

Include text between code sections to describe what the code is doing. In this text you should typically focus on the “why” and briefly summarise the “how”. For example this sentence would be a good way to introduce a code section with qiime export commands followed by some tidyverse code to prepare a tidy dataset for plotting beta diversity.

To visualise beta diversity statistics in R we first prepared a tibble containing the weighted unifrac pcoa results joined with sample metadata.

Since this reproducible analysis file contains most of your analysis code you might want to export some data for use in your main report. You can easily save data objects using write_rds and read them in using read_rds. This mechanism allows you to exchange data between the two files that you will be editing for this assignment.

Oral Presentation :

After you have completed your analyses you should have a clear idea of the main conclusions that can be drawn from the data. In addition, you should try to develop ideas on a new research project that would build on these research findings. Your new research project idea need not use the same approach as the original (ie Metagenomics, RNASeq etc) but it should involve techniques that you we have covered within this subject.

You should prepare a presentation pitching your new idea. Imagine that your audience is a funding agency or philanthropic foundation that could fund the work and your aim is to convince them to do so. This is effectively a grant proposal in oral presentation format. Remember that funders always assess project proposals on the following basis;

  1. Alignment with the funder’s goals
  2. Likelihood that the project will succeed

To make this exercise realistic you should pick a real funding agency or philanthropic foundation and name them in your proposal title slide. For example, if your project is related to medical research you might want to pitch it to the NHMRC. If it is for pure or basic research you might want to pitch it to the ARC Discovery scheme. If it is for environmental or conservation research you might want to pitch it to the Holsworth Foundation. These are just examples and you are completely free to choose any funder.

To convince the funder that your research is likely to succeed you should include mention of preliminary results obtained through analysis of the assignment data. This is important to include but it should be done to support your case for the new project, not to explain all the details of your data analysis.

Presentation duration is 8 minutes. A penalty of 10% per minute will be applied for going over time

You should structure your oral presentation as follows

Submitting your assignment

See LearnJCU for the due date for this assignment. Your code, oral presentation slides and/or report (as applicable) must be submitted by the due date. You will need to submit report.Rmd and files resulting from knitting it to github. Oral presentation slides should be submitted in powerpoint or another presentation format. Due to class numbers the actual presentations will need to be split over several sessions, however, to avoid unfairly advantaging some students over others all slides must be submitted prior to the presentations (by the due date stated on LearnJCU) regardless of when your talk is scheduled.

You should submit your assignment as follows;

Marking Criteria (BC5203)

This assignment is assessed according to its presentation and content. Marks will be allocated according to standards documented in the subject outline. Breakdown of marks for the oral presentation and reproducible analysis are shown below.

Oral Presentation (70%) will be assessed based on the quality of your slides as well as your presentation style. Pay attention to the following;

Criterion Guidelines Marks
Content Clarity and depth of insights obtained from the data. Quality of plots and/or tables presented and relevance to the conclusions being drawn from the data. 30%
Organisation Background concepts and contextual information should set the scene, results should be presented in a logical order (telling a story with data) and conclusions should summarise outcomes clearly and succinctly 10%
Delivery Clear voice placing emphasis where appropriate. Pace should allow the audience to follow along. Your presentation should be engaging and hold the audience’s interest 10%
Time Management Keeping within the time limit 10%
Response to Questions Did you listen to questions carefully and were your answers appropriate and succinct 10%

Reproducible Analysis (30%) will be assessed on the quality of presentation (language, spelling, logical flow / figures etc), clarity of insight (eg technical ability to use appropriate commands and write code, understanding of the methods used and their consequences for the final results), use of appropriate terminology.

Criterion Guidelines Marks
Presentation Appropriate use of RMarkdown to generate a structured report without extraneous errors or warning messages. 5%
Reproducibility All steps captured and/or clearly documented in the report. 10%
Code Code is well organised and includes all steps required without any extraneous steps. Good variable names and tidyverse style used throughout. 5%
Plots and/or Tables All plots used in the oral presentation are shown and are well labelled with captions 10%

Marking Criteria (BC3203)

This assignment is assessed according to its presentation and content. Marks will be allocated according to standards documented in the subject outline. Report sections will be assessed on the quality of presentation (language, spelling, logical flow and structure of text / figures etc), clarity of insight (eg technical ability to use appropriate commands and write code, understanding of the methods used and their consequences for the final results), use of appropriate terminology.

Criterion Guidelines Marks
Git commit log Meaningful messages, commits at sensible intervals (not all in one commit at the end). 5%
Overall presentation Appropriate use of RMarkdown to generate a structured report without extraneous errors or warning messages. 5%
Reproducibility All steps captured and/or clearly documented in the report. Code is well organised and includes all steps required without any extraneous steps. Good variable names used throughout. 10%
Introduction Relevant background is presented clearly and provides a motivation for the results/discussion sections that follow 15%
Results Plots and/or statistical tests clearly support conclusions in the discussion. Plots are clearly labelled, captioned and otherwise follow best practices for statistical plotting. Text highlights main results of figures and tables without exhaustively repeating them. 25%
Discussion Picks one or two major outcomes of the study (as presented in the results) and provides additional context so that their broader implications can be appreciated 10%
Reproducible Analysis Understanding of overall workflow. Appropriate choice of analytical steps, plots and/or statistical tests. 30%