Machine learning and statistical methods for preclinical omics data analysis
Autori
Viac o knihe
The generation of omics datasets with modern high-throughput techniques for molecular profiling has become an indispensable component of preclinical studies. The novel insights gained from these datasets include, for instance, the discovery of biomarkers for early diagnosis of disease and toxicological risk assessment. In addition, preclinical omics datasets may assist in the mode-of-action analysis of compounds and the elucidation of disease mechanisms. The continuous technical advances made in the last two decades have led to an increased volume and complexity of omics datasets and hence enabled the large-scale study of model organisms on various molecular layers. Among the commonly quantified molecular features are, e. g., coding and regulatory RNAs, proteins, DNA methylation marks and metabolites. Since the extraction of knowledge from these high-dimensional datasets requires sophisticated algorithms for quality control, pre-processing, statistical analysis, mathematical modeling and visualization, new challenges have arisen in bioinformatics research. In this thesis, we present novel algorithms and software tools, which have been developed for both the individual and the integrated analysis of heterogeneous omics data types. Building upon established statistics and machine learning methods, we conceived several complex methodologies, which were custom-designed for specific applications in preclinical research. Our research work covers a broad range of scientific objectives, which include the implementation of bioinformatics tools for pathway-based microarray data analysis and for the automated annotation of transcription factors. Furthermore, a strong focus was laid on the area of toxicogenomics, an emerging field of research, which aims at assessing the toxicological risk of drug candidates based on characteristic signatures extracted from preclinical omics data.