BC3203

This dataset comes from the following paper by JCU researchers;

Microbiome diversity and composition varies across body areas in a freshwater turtle

The full paper can be found at this link: McKnight et al 2020

In order to avoid the computational burden of analyzing raw reads for this analysis you are provided with the following qiime2 qza files corresponding to partially processed data.

File Name	Purpose
`rep-seqs.qza`	ASV Sequences produced by `qiime dada2 denoise-paired`. This is the output from `--o-representative-sequences`
`table.qza`	Table of ASV abundances produced by `qiime dada2 denoise-paired`. This is the output from `--o-table`
`sample-metadata.tsv`	Sample Metadata
`classifier.qza`	Naive Bayesian classifier trained using Green Genes

To download and unpack these files use the following commands. You should run these commands from the top level of your RStudio project directory for the independent project assignment.

wget 'http://data.qld.edu.au/public/Q5999/JCUBioinformatics/independent-project/data_turtle.tgz' -O data.tgz
tar -zxvf data.tgz

After downloading and unpacking the data you should find all the required files at the path, raw_data/Turtles/. The sample metadata file contains three columns.

ID: Column 1, this is simply a unique identifier for each sample
Location: Column 2, location on the body of the turtle (NA values correspond to blank control samples)
Turtle: Column 3, identifies one of 6 turtles from which the samples were taken

To help you understand the origin of these data files these are the analysis steps that were performed to obtain them.

Paired-end sequencing data in fastq format and contained in the directory reads was imported into a qiime object as follows;

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path reads \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

The data involved at this stage was too large (500Gb) to provide to all students so you will not have access to raw reads or demux-paired-end.qza.

To generate smaller data appropriate for distribution to you, denoising was performed using dada2 using these imported reads to produce the files rep-seqs.qza and table.qza

qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left-f 20 \
  --p-trim-left-r 21 \
  --p-trunc-len-f 270 \
  --p-trunc-len-r 230 \
  --p-max-ee-f 6 \
  --p-max-ee-r 6 \
  --p-trunc-q 5 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

A Naive Bayesian classifier (classifier.qza) was trained using version 13_5 of the Green Genes database. This classifier was trained specifically based on the primers used to amplify the V4-V5 region in McKnight et al and is appropriate for use with the --i-classifier argument required by qiime feature-classifier classify-sklearn. You will need this file to assign taxonomic labels to ASVs.

Suggested Analyses

As this is a 16 Metabarcoding dataset you should complete the following basic analyses in qiime2

Analysis with qiime feature-table summarize to guide a decision on the rarefaction threshold
Build a phylogenetic tree from ASVs
Calculate core alpha and beta diversity metrics
Taxonomic classification

Visualisation and statistical tests should focus on the question(s);

Does microbiome diversity and/or composition vary with body location?