TY - JOUR
T1 - Improved metagenome binning and assembly using deep variational autoencoders
AU - Nissen, Jakob Nybo
AU - Johansen, Joachim
AU - Allesøe, Rosa Lundbye
AU - Sønderby, Casper Kaae
AU - Armenteros, Jose Juan Almagro
AU - Grønbech, Christopher Heje
AU - Jensen, Lars Juhl
AU - Nielsen, Henrik Bjørn
AU - Petersen, Thomas Nordahl
AU - Winther, Ole
AU - Rasmussen, Simon
PY - 2021/5
Y1 - 2021/5
N2 - Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .
AB - Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .
KW - Bacteroides/genetics
KW - Genome, Bacterial/genetics
KW - Humans
KW - Metagenome/genetics
KW - Metagenomics
KW - Microbiota/genetics
KW - Molecular Sequence Annotation
KW - Software
UR - http://www.scopus.com/inward/record.url?scp=85098748257&partnerID=8YFLogxK
U2 - 10.1038/s41587-020-00777-4
DO - 10.1038/s41587-020-00777-4
M3 - Letter
C2 - 33398153
SN - 1087-0156
VL - 39
SP - 555
EP - 560
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 5
ER -