Research
Print page Print page
Switch language
The Capital Region of Denmark - a part of Copenhagen University Hospital
Published

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data

Research output: Contribution to journalJournal articleResearchpeer-review

  1. WISH-R- a fast and efficient tool for construction of epistatic networks for complex traits and diseases

    Research output: Contribution to journalJournal articleResearchpeer-review

  2. spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

    Research output: Contribution to journalJournal articleResearchpeer-review

  3. Estimation of allele frequency and association mapping using next-generation sequencing data

    Research output: Contribution to journalJournal articleResearchpeer-review

  1. The current epidemic of HPV-associated oropharyngeal cancer: An 18-year Danish population-based study with 2,169 patients

    Research output: Contribution to journalJournal articleResearchpeer-review

  2. Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments

    Research output: Contribution to journalJournal articleResearchpeer-review

  3. Tumor miRNA expression profile is related to vestibular schwannoma growth rate

    Research output: Contribution to journalJournal articleResearchpeer-review

  4. Forecasting Mood in Bipolar Disorder From Smartphone Self-assessments: Hierarchical Bayesian Approach

    Research output: Contribution to journalJournal articleResearchpeer-review

  5. The Number of Signaling Pathways Altered by Driver Mutations in Chronic Lymphocytic Leukemia Impacts Disease Outcome

    Research output: Contribution to journalJournal articleResearchpeer-review

View graph of relations

BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction.

RESULTS: Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets.

CONCLUSIONS: We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.

Original languageEnglish
JournalBMC Bioinformatics
Volume20
Issue number1
Pages (from-to)379
ISSN1471-2105
DOIs
Publication statusPublished - 8 Jul 2019

    Research areas

  • Cluster Analysis, Gene Expression Profiling/methods, Neural Networks, Computer, RNA, Messenger/chemistry, Sequence Analysis, RNA/methods, Single-Cell Analysis, Unsupervised Machine Learning

ID: 58986242