Trait biases in microbial reference genomes
Loading...
Date
2023
Authors
Albright, Sage
Louca, Stilianos
Journal Title
Journal ISSN
Volume Title
Publisher
Nature Communications
Abstract
Common culturing techniques and priorities bias our discovery towards specific traits that may not
be representative of microbial diversity in nature. So far, these biases have not been systematically
examined. To address this gap, here we use 116,884 publicly available metagenome-assembled
genomes (MAGs, completeness ≥80%) from 203 surveys worldwide as a culture-independent sample
of bacterial and archaeal diversity, and compare these MAGs to the popular RefSeq genome database,
which heavily relies on cultures. We compare the distribution of 12,454 KEGG gene orthologs (used
as trait proxies) in the MAGs and RefSeq genomes, while controlling for environment type (ocean,
soil, lake, bioreactor, human, and other animals). Using statistical modeling, we then determine the
conditional probabilities that a species is represented in RefSeq depending on its genetic repertoire.
We find that the majority of examined genes are significantly biased for or against in RefSeq. Our
systematic estimates of gene prevalences across bacteria and archaea in nature and gene-specific biases
in reference genomes constitutes a resource for addressing these issues in the future.
Description
17 pages
Keywords
Microbiology techniques, Microbial ecology, Biodiversity
Citation
Albright, S., Louca, S. Trait biases in microbial reference genomes. Sci Data 10, 84 (2023). https://doi.org/10.1038/s41597-023-01994-7