BC3203

This dataset comes from the following paper by JCU researchers;

Microbiome diversity and composition varies across body areas in a freshwater turtle

The full paper can be found at this link: McKnight et al 2020

In order to avoid the computational burden of analyzing raw reads for this analysis you are provided with the following qiime2 qza files corresponding to partially processed data.

File Name Purpose
rep-seqs.qza ASV Sequences produced by qiime dada2 denoise-paired. This is the output from --o-representative-sequences
table.qza Table of ASV abundances produced by qiime dada2 denoise-paired. This is the output from --o-table
sample-metadata.tsv Sample Metadata
classifier.qza Naive Bayesian classifier trained using Green Genes

To download and unpack these files use the following commands. You should run these commands from the top level of your RStudio project directory for the independent project assignment.

wget 'http://data.qld.edu.au/public/Q5999/JCUBioinformatics/independent-project/data_turtle.tgz' -O data.tgz
tar -zxvf data.tgz

After downloading and unpacking the data you should find all the required files at the path, raw_data/Turtles/. The sample metadata file contains three columns.

To help you understand the origin of these data files these are the analysis steps that were performed to obtain them.

Paired-end sequencing data in fastq format and contained in the directory reads was imported into a qiime object as follows;

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path reads \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

The data involved at this stage was too large (500Gb) to provide to all students so you will not have access to raw reads or demux-paired-end.qza.

To generate smaller data appropriate for distribution to you, denoising was performed using dada2 using these imported reads to produce the files rep-seqs.qza and table.qza

qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left-f 20 \
  --p-trim-left-r 21 \
  --p-trunc-len-f 270 \
  --p-trunc-len-r 230 \
  --p-max-ee-f 6 \
  --p-max-ee-r 6 \
  --p-trunc-q 5 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

A Naive Bayesian classifier (classifier.qza) was trained using version 13_5 of the Green Genes database. This classifier was trained specifically based on the primers used to amplify the V4-V5 region in McKnight et al and is appropriate for use with the --i-classifier argument required by qiime feature-classifier classify-sklearn. You will need this file to assign taxonomic labels to ASVs.

Suggested Analyses

As this is a 16 Metabarcoding dataset you should complete the following basic analyses in qiime2

Visualisation and statistical tests should focus on the question(s);