BC3203

This dataset comes from the following paper by JCU researchers;

Adverse effect of early-life high-fat/high-carbohydrate (“Western”) diet on bacterial community in the distal bowel of mice

The full paper can be found at this link: Infante Villamil et al 2018

In order to avoid the computational burden of analyzing raw reads for this analysis you are provided with the following qiime2 qza files corresponding to partially processed data.

File Name	Purpose
`rep-seqs.qza`	ASV Sequences produced by `qiime dada2 denoise-paired`. This is the output from `--o-representative-sequences`
`table.qza`	Table of ASV abundances produced by `qiime dada2 denoise-paired`. This is the output from `--o-table`
`sample-metadata.tsv`	Sample Metadata
`classifier.qza`	Naive Bayesian classifier trained using Green Genes

To download and unpack these files use the following commands. You should run these commands from the top level of your RStudio project directory for the independent project assignment.

wget 'http://data.qld.edu.au/public/Q5999/JCUBioinformatics/independent-project/data_mouse.tgz' -O data.tgz
tar -zxvf data.tgz

After downloading and unpacking the data you should find all the required files at the path, raw_data/Mouse/ The sample metadata file contains three columns.

ID: Column 1, this is simply a unique identifier for each sample
Diet: Column 2, CT (Control), HFD (High Fat Diet)
Timepoint: Column 3, Sampling sime T1 (at Weaning) and T2 (after 11 weeks)

In this experiment there are two diets (Control, normal diet) and (HFD, high fat diet) and two timepoints. The first timepoint (T1) is a faecal sample taken at the time of weaning. Up to this time mice were fed mother’s milk. The second timepoint (T2) is 11 weeks after weaning and incorporates the effect of control diet (CT) and high fat diet (HFD).

Paired-end sequencing data in fastq format and contained in the directory reads was imported into a qiime object as follows;

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path reads \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

The data involved at this stage was too large (500Gb) to provide to all students so you will not have access to raw reads or demux-pared-end.qza.

Denoising was performed using dada2 using these imported reads to produce the files rep-seqs.qza and table.qza

qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs demux-paired-end.qza \
  --p-trim-left-f 20 \
  --p-trim-left-r 21 \
  --p-trunc-len-f 270 \
  --p-trunc-len-r 270 \
  --p-max-ee-f 4 \
  --p-max-ee-r 4 \
  --p-trunc-q 5 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

A Naive Bayesian classifier (classifier.qza) was trained using version 13_5 of the Green Genes database. This classifier was trained specifically based on the primers used to amplify the V4-V5 region and is appropriate for use with the --i-classifier argument required by qiime feature-classifier classify-sklearn. You will need this file to assign taxonomic labels to ASVs.

Suggested Analyses

As this is a 16 Metabarcoding dataset you should complete the following basic analyses in qiime2

Analysis with qiime feature-table summarize to guide a decision on the rarefaction threshold
Build a phylogenetic tree from ASVs
Calculate core alpha and beta diversity metrics
Taxonomic classification

Visualisation and statistical tests should focus on the question(s);

Is microbiome diversity and/or composition affected by diet?