Computational analysis of human genomic variants and lncRNAs from sequence data
High-throughput sequencing technologies, which next-generation sequencing (NGS) as the representative technique, have become commonly used in scientific research and clinical diagnosis. Their abilities for fast sequencing massive human deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequences has produced a large amount of data which can be analysed for human health research.
Some human genetic traits or diseases can be explained by genomic variants. They may affect functions of genomes and influence biological processes. Genomic variants can be detected from DNA NGS data with computational variant calling tools. The development in recent years have enabled these tools to detect large sized genomic variants, such as large ions and deletions (indels). However, the optimal tools for detecting variably sized indels with different sequencing methods remained unclear. In this thesis, variant calling tools were evaluated with different NGS data for small and large indel detections. The results provided suggestions for tool ions, revealed the current limits of indel detection methods, and gave directions for future developments.
Sequence contexts play essential roles in both biological functions and data analysis of genomic variants. In this thesis, a computational tool, which was more versatile than other methods, was developed with the aim to comprehensively analyse variant sequence contexts. The tool was applied to several human variant sets and the results showed that human genomic variants had strong correlations with their sequence contexts.
Besides genomic variants, long non-coding RNAs (lncRNAs) also have important functions in, for example, gene regulations. The development of RNA sequencing helps to find many novel RNAs in recent years. LncRNAs share similar features as messenger RNAs (mRNAs) but usually cannot be translated into proteins. These lncRNAs should be distinguished from mRNAs and be further studied. With machine learning methods, lncRNAs can be predicted and labelled more precisely with sequence data than before. This thesis compared different lncRNA prediction methods and showed that the deep learning method had the best performance.
|Sarja:||F osa 23|