![]() Based on this principle, software has been developed to predict the 16S GCN in a process often referred to as hidden state prediction. Studies have shown that 16S GCN exhibits a strong phylogenetic signal, and therefore 16S GCN can be inferred from closely related reference bacteria. The majority of bacteria species have not been cultured or sequenced and their 16S GCNs are unknown. As a result, it has been argued that 16S GCN variations should be taken into account in 16S rRNA gene-based analyses. However, the 16S rRNA gene copy number (16S GCN) can vary from 1 to more than 15 and this large copy number variation introduces bias in the relative cell abundance estimated using the gene read counts (thereafter referred to as relative gene abundance), and consequently it can skew the community profiles, diversity measures and lead to qualitatively incorrect interpretations. Sequence reads are usually matched to reference databases like SILVA, RDP and GreenGenes to determine the presence of taxa and their relative cell abundances. The 16S ribosomal RNA (16S rRNA) gene is the gold standard for bacterial and archaeal diversity study and has been commonly used to estimate the composition of bacterial and archaeal communities through amplicon sequencing. On the other hand, we found that GCN variation has limited impacts on beta-diversity analyses such as PCoA, NMDS, PERMANOVA and random-forest test. We found that the prediction uncertainty is small enough for 99% of the communities that 16S GCN correction should improve their compositional and functional profiles estimated using 16S rRNA reads. We have predicted GCN for 592605 OTUs in the SILVA database and tested 113842 bacterial communities that represent an exhaustive and diverse list of engineered and natural environments. Using cross-validation, we show that our method provides robust confidence estimates for the GCN predictions and outperforms other methods in both precision and recall. RasperGade16S implements a maximum likelihood framework of pulsed evolution model and explicitly accounts for intraspecific GCN variation and heterogeneous GCN evolution rates among species. Here we develop RasperGade16S, a novel method and software to better model and capture the inherent uncertainty in 16S GCN prediction. A recent study suggests that the prediction uncertainty can be so great that copy number correction is not justified in practice. To correct the biases, methods have been developed to predict 16S GCN. 16S rRNA gene copy number (16S GCN) varies among bacterial species and this variation introduces potential biases to microbial diversity analyses using 16S rRNA read counts.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |