This dataset comes from the following paper by JCU researchers;
Adverse effect of early-life high-fat/high-carbohydrate (“Western”) diet on bacterial community in the distal bowel of mice
The full paper can be found at this link: Infante Villamil et al 2018
In order to avoid the computational burden of analyzing raw reads for this analysis you are provided with the following qiime2 qza
files corresponding to partially processed data.
File Name | Purpose |
---|---|
rep-seqs.qza |
ASV Sequences produced by qiime dada2 denoise-paired . This is the output from --o-representative-sequences |
table.qza |
Table of ASV abundances produced by qiime dada2 denoise-paired . This is the output from --o-table |
sample-metadata.tsv |
Sample Metadata |
classifier.qza |
Naive Bayesian classifier trained using Green Genes |
To download and unpack these files use the following commands. You should run these commands from the top level of your RStudio project directory for the independent project assignment.
wget 'http://data.qld.edu.au/public/Q5999/JCUBioinformatics/independent-project/data_mouse.tgz' -O data.tgz
tar -zxvf data.tgz
After downloading and unpacking the data you should find all the required files at the path, raw_data/Mouse/
The sample metadata file contains three columns.
In this experiment there are two diets (Control, normal diet) and (HFD, high fat diet) and two timepoints. The first timepoint (T1) is a faecal sample taken at the time of weaning. Up to this time mice were fed mother’s milk. The second timepoint (T2) is 11 weeks after weaning and incorporates the effect of control diet (CT) and high fat diet (HFD).
Paired-end sequencing data in fastq format and contained in the directory reads
was imported into a qiime object as follows;
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path reads \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path demux-paired-end.qza
The data involved at this stage was too large (500Gb) to provide to all students so you will not have access to raw reads or demux-pared-end.qza
.
Denoising was performed using dada2 using these imported reads to produce the files rep-seqs.qza
and table.qza
qiime dada2 denoise-paired \
--p-n-threads 40 \
--i-demultiplexed-seqs demux-paired-end.qza \
--p-trim-left-f 20 \
--p-trim-left-r 21 \
--p-trunc-len-f 270 \
--p-trunc-len-r 270 \
--p-max-ee-f 4 \
--p-max-ee-r 4 \
--p-trunc-q 5 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
A Naive Bayesian classifier (classifier.qza
) was trained using version 13_5 of the Green Genes database. This classifier was trained specifically based on the primers used to amplify the V4-V5 region and is appropriate for use with the --i-classifier
argument required by qiime feature-classifier classify-sklearn
. You will need this file to assign taxonomic labels to ASVs.
As this is a 16 Metabarcoding dataset you should complete the following basic analyses in qiime2
qiime feature-table summarize
to guide a decision on the rarefaction thresholdVisualisation and statistical tests should focus on the question(s);