As DNA-sequencing technologies became cheaper to use, profiling the microbiomes of many different samples became efficient and feasible through the sequencing of highly variable regions on the bacterial 16S rRNA gene. A question that arises very quickly to novices in the domain of microbiome analysis is how to properly interpret 16S microbiome composition data. The counts within the OTU table to be analysed always vary greatly across samples, as an artefact of the sequencing technology. Additionally, the count data for a given bacterial group across samples is highly non-normal and at best somewhat close in distribution to that of a zero-inflated negative binomial random variable. Further complicating interpretation, the data is highly multidimensional in that the number of bacterial groupings (OTUs) greatly outnumbers the sample size. In order to determine if the microbiota are driving disease, regression-based analyses will need to be undertaken. In searching the literature this summer, I found that there is no consensus in how to do regression with 16S microbiome data. Issues arise due to the compositional nature of the data along with the high degree of dimensionality. One of the main benefits of regression is being able to take into account possible covariates and whether or not they, rather the microbiota, are the true drivers of observed differences. This becomes increasingly important in human studies where subjects have not been contained in environments controlled by the investigator.
At this week’s microbiome journal club at 3:00pm on Friday, August 26th in MUMC 3N10A, we will be discussing aMiSPU, a novel regression-based method for microbiome data presented in “An adaptive association test for microbiome data” by Wu et al. in 2016. The paper mainly compares aMiSPU to a similar method known as optimal MiRKAT in how well they perform on both simulated and real data*.
Questions for discussion:
- Is the aMiSPU test a valid statistical method of association for microbiome data and is it better than general linear modelling?
- Are significant alterations in rare microbes within microbiome studies repeatable and reliable? Should statistical tests of differential abundance be adjusted to detect differences in rare microbiota?
- Can microbiome data be accurately simulated, and if so, how important will methods papers on simulated data be for future developments in the field?
Come join for food and drinks afterwards at the Pheonix at 4!
*Note that within the explanation of the aMiSPU test on page 4 of 12 it is briefly mentioned that TaMiSPUu , TaMiSPUw , and TaMiSPU are no longer genuine p values and that a permutation method is used to estimate their p values. If you’re looking to wrap your head around the math, the authors explain this in more detail in a previous paper about aSPU available here under the section “A new class of tests and a data-adaptive test” at the bottom of page 4 and top of page 5 of the pdf. In brief, the permutation method involves randomly rearranging the subjects many times to literally create the null distribution of no association to be tested against. I will be also be explaining it in the presentation, mainly because I find it exciting, but also because aMiSPU involves multiple layers of permutations.