专栏名称: 中国科学院微生物研究所
内聚人心,外树形象
目录
相关文章推荐
51好读  ›  专栏  ›  中国科学院微生物研究所

6.15真菌室学术报告:Transcriptome assembly evaluation and single cell...

中国科学院微生物研究所  · 公众号  ·  · 2017-06-14 21:01

正文

报告题目:Transcriptome assembly evaluation and single cell RNA-Seq regulatory network


报告人:宋麒


主持人:李宽(真菌室信息平台)


报告单位:弗吉尼亚理工大学


报告时间:2017年6月15号(星期四)上午10:00-12:00


报告地点:中科院微生物所A203


报告人简介:

弗吉尼亚理工大学(Virginia Polytechnic Institute and State University)在读博士生。 2015 年于福建农林大学获生物信息学硕士学位,随即赴弗吉尼亚理工大学攻读遗传生物信息与计算生物学博士学位(Genetics, Bioinformatics and Computational Biology)。研究方向:1. 机器学习模型在转录组组装中的应用2. 调控网络的计算生物学分析3. 基于单细胞测序数据的多层网络功能模块的建立 4. R 语言包开发


报告摘要:

Short reads RNA-seq followed by de novo transcriptome assembly has been widely adopted to investigate the transcriptome of species without reference genome. How to remove false positive transcripts from an assembly is still an open challenge for computational biology. Recently, single molecule sequencing (PacBio sequencing) has emerged as a promising method, as it directly sequences individual RNA molecules, with read length much longer than short reads. However, long reads from PacBio sequencing are known to have higher error rates than short reads and often cover only a small portion of all expressed genes. In this study, we propose to combine both short reads assembly and long reads sequencing data to produce high-quality transcriptome assembly. We use published short read and long read data generated from the model organism Arabidopsis as gold standard to train several machine learning methods including K-Nearest Neighbors, Random Forest, Support Vector Machine, Logistic Regression and Naïve Bayes methods, to identify high-quality contigs from low-quality contigs. We compared contigs assembled using short reads with both reference genome and long read sequencing results. We found that most of the high-quality contigs formed by short reads match to the long reads and reference genome equally well, suggesting that high quality contigs are indeed true isoforms that are expressed in our data. This model was then used to predict the quality of the contigs that are of intermediate quality. The whole framework can serve as an automatic evaluation tool for transcriptome assembly without reference genome. We will apply this tool to the long read and short read sequencing data from a parasitic species, O. cernua, to generate a high-quality transcriptome for this species.

Single cell RNA-seq has been successfully applied in human studies, revealing some important mechanisms such as cancer heterogeneity and stem cell differentiation. In recent year, many computational approaches have been developed to analyze single cell sequencing data. For example, trajectory analysis is one of the most commonly used methods to investigate dynamic change of cell state by ordering the cell in pseudo time series. However, the regulatory information such as TF-target interactions is not included as prior knowledge in these computational approaches. Therefore, we propose to integrate known interactions into framework of single cell RNA-seq analysis to examine the dynamic change of functional modules in time series.