This dataset comes from the following paper by JCU researchers;
Microbiome diversity and composition varies across body areas in a freshwater turtle
The full paper can be found at this link: McKnight et al 2020
In order to avoid the computational burden of analyzing raw reads for this analysis you are provided with the following qiime2 qza
files corresponding to partially processed data.
File Name | Purpose |
---|---|
rep-seqs.qza |
ASV Sequences produced by qiime dada2 denoise-paired . This is the output from --o-representative-sequences |
table.qza |
Table of ASV abundances produced by qiime dada2 denoise-paired . This is the output from --o-table |
sample-metadata.tsv |
Sample Metadata |
classifier.qza |
Naive Bayesian classifier trained using Green Genes |
To download and unpack these files use the following commands. You should run these commands from the top level of your RStudio project directory for the independent project assignment.
wget 'http://data.qld.edu.au/public/Q5999/JCUBioinformatics/independent-project/data_turtle.tgz' -O data.tgz
tar -zxvf data.tgz
After downloading and unpacking the data you should find all the required files at the path, raw_data/Turtles/
. The sample metadata file contains three columns.
To help you understand the origin of these data files these are the analysis steps that were performed to obtain them.
Paired-end sequencing data in fastq format and contained in the directory reads
was imported into a qiime object as follows;
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path reads \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path demux-paired-end.qza
The data involved at this stage was too large (500Gb) to provide to all students so you will not have access to raw reads or demux-paired-end.qza
.
To generate smaller data appropriate for distribution to you, denoising was performed using dada2 using these imported reads to produce the files rep-seqs.qza
and table.qza
qiime dada2 denoise-paired \
--p-n-threads 40 \
--i-demultiplexed-seqs demux-paired-end.qza \
--p-trim-left-f 20 \
--p-trim-left-r 21 \
--p-trunc-len-f 270 \
--p-trunc-len-r 230 \
--p-max-ee-f 6 \
--p-max-ee-r 6 \
--p-trunc-q 5 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
A Naive Bayesian classifier (classifier.qza
) was trained using version 13_5 of the Green Genes database. This classifier was trained specifically based on the primers used to amplify the V4-V5 region in McKnight et al and is appropriate for use with the --i-classifier
argument required by qiime feature-classifier classify-sklearn
. You will need this file to assign taxonomic labels to ASVs.
As this is a 16 Metabarcoding dataset you should complete the following basic analyses in qiime2
qiime feature-table summarize
to guide a decision on the rarefaction thresholdVisualisation and statistical tests should focus on the question(s);