Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle

Zacher B, Lidschreiber M, Cramer P, Gagneur J, Tresch A (2014)


Publication Type: Journal article

Publication year: 2014

Journal

Book Volume: 10

Article Number: 768

Journal Issue: 12

DOI: 10.15252/msb.20145654

Abstract

DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand-specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA-associated protein complexes. However, current HMM-based approaches are not able to assign forward or reverse direction to states or properly integrate strand-specific (e.g., RNA expression) with non-strand-specific (e.g., ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation-elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes. Synopsis Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome. Genomic feature annotations derived from bidirectional hidden Markov models are up to twice as accurate compared to those from standard hidden Markov models. Variations in the yeast Pol II transcription cycle fall into clusters of co-regulated genes, whose functional categories range from housekeeping and cell cycle to stress response. New insights into transcriptional regulation are obtained, indicating a regulated initiation-elongation transition and a distinct transcription mechanism for highly expressed genes. An implementation of bidirectional hidden Markov models is freely available at the Bioconductor website: http://www.bioconductor.org/packages/devel/bioc/html/STAN.html. Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome.

Involved external institutions

How to cite

APA:

Zacher, B., Lidschreiber, M., Cramer, P., Gagneur, J., & Tresch, A. (2014). Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle. Molecular Systems Biology, 10(12). https://doi.org/10.15252/msb.20145654

MLA:

Zacher, Benedikt, et al. "Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle." Molecular Systems Biology 10.12 (2014).

BibTeX: Download