- AutorIn
- Halima Saker
- Titel
- Segmentation of Heterogeneous Multivariate Genome Annotation Data
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-759147
- Datum der Einreichung
- 10.05.2021
- Datum der Verteidigung
- 16.07.2021
- Abstract (EN)
- Due to the potential impact of next-generation sequencing NGS, we have seen a rapid increase in genomic information and annotation information that can be naturally mapped to genomic locations. In cancer research, for example, there are significant efforts to chart DNA methylation at single-nucleotide resolution. The NIH Roadmap Epigenomics Projects, on the other hand, has set out to chart a large number of different histone modifications. However, throughout the last few years, a very diverse set of aspects has become the aim of large-scale experiments with a genome-wide readout. Therefore, the identification of functional units of the genomic DNA is considered a significant and essential challenge. Subsequently, we have been motivated to implement multi-dimensional segmentation approaches that serve gene variety and genome heterogeneity. The segmentation of multivariate genomic, epigenomic, and transcriptomic data from multiple time points, tissue, and cell types to compare changes in genomic organization and identify common elements form the headline of our research. Next generation sequencing offers a rich material used in bioinformatics research to find answers, solutions, and exploration for the molecular functions, diseases causes, etc. Rapid advances in technology also have led to the proliferation of types of experiments. Although sharing next-generation sequencing as the readout produces signals with an entirely different inherent resolution, ranging from a precise transcript structure at the single-nucleotide resolution to pull-down and enrichment-based protocols with resolutions on order 100 nt to chromosome conformation data that are only accurate at kilobase resolution. Therefore, the main goal of the dissertation project is to design, implement, and test novel segmentation algorithms that work on one- and multi-dimensional and can accommodate data of different types and resolutions. The target data in this project is multivariate genetic, epigenetic, transcriptomic, and proteomic data; the reason is that these datasets can change under the effect of several conditions such as chemical, genetic and epigenetic modifications. A promising approach towards this end is to identify intervals of the genomic DNA that behave coherently in multiple conditions and tissues and could be defined as intervals on which all measured quantities are constant within each experiment. A naive approach would take each data set in isolation and estimate intervals in which the signal at hand is constant. Another approach takes datasets all at once as input without recurring to one-dimensional segmentation. Once implemented, the algorithm should be applied on heterogeneous genomic, transcriptomic, proteomic, and epigenomic data; the aim here is to draw and improve the map of functionally coherent segments of a genome. Current approaches either focus on individual datasets, as in the case of tiling array transcriptomics data; Or on the analysis of comparable experiments such as ChIP-seq data for various histone modifications. The simplest sub-problem in segmentation is to decide whether two adjacent intervals should form two distinct segments or whether they should be combined into a single one. We have to find out how this should be done in the multi-D segmentation; in 1-D, this is relatively well known. This leads to a segmentation of the genome concerning the particular dataset. The intersection of segmentations for different datasets could identify then the DNA elements.
- Andere Ausgabe
- Segmentation of Heterogeneous Multivariate Genome Annotation Data
- Forschungsdatenverweis
- Conseg R package
Link: https://github.com/Bierinformatik/consseg - Weighted Consensus Segmentations
DOI: 10.3390/computation9020017
Link: https://www.mdpi.com/2079-3197/9/2/17 - Multidimensional segmentation of heterogeneous data
DOI: 10.1109/ICABME.2017.8167550
Link: https://ieeexplore.ieee.org/document/8167550 - Freie Schlagwörter (EN)
- Bioinformatics, segmentation aggregation, consensus segmentation, Time series analysis, dynamic programming
- Klassifikation (DDC)
- 000
- Den akademischen Grad verleihende / prüfende Institution
- Universität Leipzig, Leipzig
- Lebanese Univeristy, Lebanon
- Version / Begutachtungsstatus
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:15-qucosa2-759147
- Veröffentlichungsdatum Qucosa
- 08.09.2021
- Dokumenttyp
- Dissertation
- Sprache des Dokumentes
- Englisch
- Lizenz / Rechtehinweis
CC BY-NC-ND 4.0