In this assignment you will use the skills you learned throughout the subject to analyse and interpret a next generation sequencing dataset. One of the key challenges of this assignment is that you will need to adapt what you learned previously to suit the provided data. Unlike these earlier assignments no step-by-step instructions are provided so you will need to make many decisions about what analysis to perform and figure out the right commands to make it work. This guide provides information on where to get data and how to structure your report and presentation.
Note that many of the projects included below rely on skills for analysing RNASeq data. Although you are not required to submit a report for the RNASeq assignment (that assignment is for BC3203) you should complete the guide to RNASeq. Completing this guide will help you learn for your final exam and will provide you with the skills you need to complete RNASeq related projects from the list below.
First ensure that you obtain your own copy of the student version of the assignment by accepting the invite link https://classroom.github.com/a/Jf2sainw
Choose ONE dataset from the following list.
Dataset Title | Analysis Type | Dataset Link and Instructions |
---|---|---|
Freshwater turtle microbiome | 16S Metabarcoding | Turtle Data |
Mouse diets | 16S Metabarcoding | Mouse Data |
Abalone heat stress | RNASeq | Abalone Data |
Response to sars-cov-2 infection | RNASeq | COVID Data |
Note that each of these datasets is associated with published work. Although you should certainly read the published work to obtain background information on the dataset you are not expected (nor is it always possible) to reproduce the published results. The published results often contain statistical analyses that are beyond the scope of this subject and would take far more time than you have available. Your analysis should focus on a few key techniques that you feel capable of performing and interpreting to a high standard.
You will find that there are many analyses that you could perform with this data. Each dataset comes with a set of suggested analyses specific to that data type. You don’t need to adhere strictly to the suggestions but if you deviate significantly you should discuss this with teaching staff before investing too much time.
Assessment for this assignment is in three parts;
Maximum length is 3 pages including figures and text
The assignment template comes with a report file (report.Rmd
). You should replace the template text in this report with your own work. Your report should adhere to the following structure.
Introduction: (1- paragraphs; 1/2 page max) What experimental / observational conditions were analysed in the study. What question does the study hope to answer and why is this important?
Results: (1-2 pages; 2 pages max) A mix of text, plots and/or tables as appropriate. All plots and tables should have proper labels and captions and must be referred to in the text. Use the text to highlight important features in plots/tables. You should pay special attention to the plots shown in this section. They should be done using ggplot and not pasted from screenshots. They should be carefully chosen to support the main conclusions of the study.
Discussion: (1/2-1 page; 1 page max) What are the key results from the study and how does this relate to previous work? What conclusions can be drawn from the study? How can the results of this study be used to guide future work?
All parts of this report (Introduction, Analysis steps, Discussion) should be clearly written, use appropriate terminology and demonstrate that you have a good understanding of the methods used.
This part of your assignment should be knitted to pdf
and submitted via LearnJCU.
This part of your report will be quite different from those you have built with RMarkdown previously. In particular, you are encouraged to describe the steps in your analysis (and display corresponding code). This part of your assessment should be written in the file analysis.Rmd
. Since you will need to delete the template text in that report you can always retrieve it [here](template text. This report represent a complete, reproducible record of all the steps you took to complete the analysis. It should include code, descriptive text (which reads like a methods section of a paper) and outputs (ie plots and/or tables). Pay consideration to the following;
Ensure that you have a clear structure and logical order. Each displayed code block should have a clear purpose (described by text) and if you have highly repetitive code sections you may choose to hide some code from display or restructure your code to make it less repetitive.
If you are including bash
commands in this section you should make sure that these do not run when you compile your report. This can be done by using eval=FALSE
on all your bash
code blocks.
Although this section should describe your analysis in detail you might find that there are some very minor steps in the analysis that can be skipped. Make sure that you include;
As you develop your oral presentation you might try some analyses or plots and later decide that they are not useful. Do NOT include these. Keep your presentation and reproducible analysis focused on what was successful.
Use RMarkdown code blocks to show the code (bash or R) used to perform steps (with echo=TRUE) but be very careful to avoid showing extraneous error messages or unwanted outputs in your final report.
Include text between code sections to describe what the code is doing. In this text you should typically focus on the “why” and briefly summarise the “how”. For example this sentence would be a good way to introduce a code section with qiime export
commands followed by some tidyverse code to prepare a tidy dataset for plotting beta diversity.
To visualise beta diversity statistics in R we first prepared a tibble containing the weighted unifrac pcoa results joined with sample metadata.
Since this reproducible analysis file contains most of your analysis code you might want to export some data for use in your main report. You can easily save data objects using write_rds
and read them in using read_rds
. This mechanism allows you to exchange data between the two files that you will be editing for this assignment.
After you have completed your analyses you should have a clear idea of the main conclusions that can be drawn from the data. In addition, you should try to develop ideas on a new research project that would build on these research findings. Your new research project idea need not use the same approach as the original (ie Metagenomics, RNASeq etc) but it should involve techniques that you we have covered within this subject.
You should prepare a presentation pitching your new idea. Imagine that your audience is a funding agency or philanthropic foundation that could fund the work and your aim is to convince them to do so. This is effectively a grant proposal in oral presentation format. Remember that funders always assess project proposals on the following basis;
To make this exercise realistic you should pick a real funding agency or philanthropic foundation and name them in your proposal title slide. For example, if your project is related to medical research you might want to pitch it to the NHMRC. If it is for pure or basic research you might want to pitch it to the ARC Discovery scheme. If it is for environmental or conservation research you might want to pitch it to the Holsworth Foundation. These are just examples and you are completely free to choose any funder.
To convince the funder that your research is likely to succeed you should include mention of preliminary results obtained through analysis of the assignment data. This is important to include but it should be done to support your case for the new project, not to explain all the details of your data analysis.
Presentation duration is 8 minutes. A penalty of 10% per minute will be applied for going over time
You should structure your oral presentation as follows
See LearnJCU for the due date for this assignment. Your code, oral presentation slides and/or report (as applicable) must be submitted by the due date. You will need to submit report.Rmd
and files resulting from knitting it to github. Oral presentation slides should be submitted in powerpoint or another presentation format. Due to class numbers the actual presentations will need to be split over several sessions, however, to avoid unfairly advantaging some students over others all slides must be submitted prior to the presentations (by the due date stated on LearnJCU) regardless of when your talk is scheduled.
You should submit your assignment as follows;
Your report.Rmd
is in github_document
format. You should keep it this way because it will render nicely on github (which is where I will read it for marking purposes). Be sure to render your document regularly with the Knit
function (don’t leave this to the end or it will be difficult to debug).
Commit changes and push to github. Be sure to add files produced by Knitr
to git when you do so. This includes report.md
and may also include folders containing rendered images that will appear when you run the Knitr
command. You may also wish to include bash
scripts you wrote to perform analysis tasks (eg slurm scripts for running long jobs). If you are unsure about these ask your instructor. If you are successful you will be able to click report.md
on github and see a nicely rendered version of your reproducible analysis. Make sure you have pushed your latest changes to github by the due date.
Any computer generated files or large data files that you downloaded should NOT be added to git. You may use the .gitignore file included with the project template to make sure these large files are ignored.
This assignment is assessed according to its presentation and content. Marks will be allocated according to standards documented in the subject outline. Breakdown of marks for the oral presentation and reproducible analysis are shown below.
Oral Presentation (50%) will be assessed based on the quality of your slides as well as your presentation style. Pay attention to the following;
Criterion | Guidelines | Marks |
---|---|---|
Content | Clarity and depth of insights obtained from the data. Quality of plots and/or tables presented and relevance to the conclusions being drawn from the data. | 15% |
Organisation | Background concepts and contextual information should set the scene, results should be presented in a logical order (telling a story with data) and conclusions should summarise outcomes clearly and succinctly | 15% |
Delivery | Clear voice placing emphasis where appropriate. Pace should allow the audience to follow along. Your presentation should be engaging and hold the audience’s interest | 10% |
Time Management | Keeping within the time limit | 5% |
Response to Questions | Did you listen to questions carefully and were your answers appropriate and succinct | 5% |
Reproducible Analysis (20%) will be assessed on the quality of presentation (language, spelling, logical flow / figures etc), clarity of insight (eg technical ability to use appropriate commands and write code, understanding of the methods used and their consequences for the final results), use of appropriate terminology.
Criterion | Guidelines | Marks |
---|---|---|
Presentation | Appropriate use of RMarkdown to generate a structured report without extraneous errors or warning messages. | 5% |
Reproducibility | All steps captured and/or clearly documented in the report. | 5% |
Code | Code is well organised and includes all steps required without any extraneous steps. Good variable names and tidyverse style used throughout. | 5% |
Plots and/or Tables | All plots used in the oral presentation are shown and are well labelled with captions | 5% |
Report (30%)
Report sections will be assessed on the quality of presentation (language, spelling, logical flow and structure of text / figures etc), clarity of insight (eg technical ability to use appropriate commands and write code, understanding of the methods used and their consequences for the final results), use of appropriate terminology.
Criterion | Guidelines | Marks |
---|---|---|
Overall presentation | Appropriate use of RMarkdown to generate a structured report without extraneous errors or warning messages. | 5% |
Introduction | Relevant background is presented clearly and provides a motivation for the results/discussion sections that follow | 5% |
Results | Plots and/or statistical tests clearly support conclusions in the discussion. Plots are clearly labelled, captioned and otherwise follow best practices for statistical plotting. Text highlights main results of figures and tables without exhaustively repeating them. | 10% |
Discussion | Picks one or two major outcomes of the study (as presented in the results) and provides additional context so that their broader implications can be appreciated | 10% |