1.Orthonome - 用于预测完成图和草图基因组的高质量直系同源基因集的新流程
Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes(BMC Genomics)
Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge.Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophilagenomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com.We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought.
2. 血液单核细胞转录组和表观遗传分析显示与人动脉粥样硬化相关的基因座
Blood monocyte transcriptome and epigenome analyses reveal loci associated with human atherosclerosis(Nature Communications)
Little is known regarding the epigenetic basis of atherosclerosis. Here we present the CD14+ blood monocyte transcriptome and epigenome signatures associated with human atherosclerosis. The transcriptome signature includes transcription coactivator, ARID5B, which is known to form a chromatin derepressor complex with a histone H3K9Me2-specific demethylase and promote adipogenesis and smooth muscle development. ARID5B CpG (cg25953130) methylation is inversely associated with both ARID5B expression and atherosclerosis, consistent with this CpG residing in an ARID5B enhancer region, based on chromatin capture and histone marks data. Mediation analysis supports assumptions that ARID5B expression mediates effects of cg25953130 methylation and several cardiovascular disease risk factors on atherosclerotic burden. In lipopolysaccharide-stimulated human THP1 monocytes, ARID5B knockdown reduced expression of genes involved in atherosclerosis-related inflammatory and lipid metabolism pathways, and inhibited cell migration and phagocytosis. These data suggest thatARID5B expression, possibly regulated by an epigenetically controlled enhancer, promotes atherosclerosis by dysregulating immunometabolism towards a chronic inflammatory phenotype.
3.在裂殖酵母中凝缩素蛋白介导有丝分裂时期的染色质的重塑
Condensin-mediated remodeling of the mitotic chromatin landscape in fission yeast(Nature Genetics)
The eukaryotic genome consists of DNA molecules far longer than the cells that contain them. They reach their greatest compaction during chromosome condensation in mitosis. This process is aided by condensin, a structural maintenance of chromosomes (SMC) family member. The spatial organization of mitotic chromosomes and how condensin shapes chromatin architecture are not yet fully understood. Here we use chromosome conformation capture (Hi-C) to study mitotic chromosome condensation in the fission yeast Schizosaccharomyces pombe. This showed that the interphase landscape characterized by small chromatin domains is replaced by fewer but larger domains in mitosis. Condensin achieves this by setting up longer-range, intrachromosomal DNA interactions, which compact and individualize chromosomes. At the same time, local chromatin contacts are constrained by condensin, with profound implications for local chromatin function during mitosis. Our results highlight condensin as a major determinant that changes the chromatin landscape as cells prepare their genomes for cell division.
4.果蝇胚胎单细胞转录组
The Drosophila embryo at single-cell transcriptome resolution(Science)
By the onset of morphogenesis, Drosophila embryos consist of about 6000 cells that express distinct gene combinations. Here, we used single-cell sequencing of precisely staged embryos and devised DistMap, a computational mapping strategy to reconstruct the embryo and to predict spatial gene expression approaching single-cell resolution. We produce a virtual embryo with about 8000 expressed genes per cell. Our interactive “Drosophila-Virtual-Expression-eXplorer” (DVEX) database generates three-dimensional virtual in situ hybridizations and computes gene expression gradients. We used DVEX to uncover patterned expression of transcription factors and long noncoding RNAs, as well as signaling pathway components. Spatial regulation of Hippo signaling during early embryogenesis suggests a mechanism for establishing asynchronous cell proliferation. Our approach is suitable to generate transcriptomic blueprints for other complex tissues.
5.从拟南芥到水稻显著地拓扑相关结构域将染色质结构区分开来
Prominent topologically associated domains differentiate global chromatin packing in rice from Arabidopsis(Nature Plants)
The non-random three-dimensional organization of genomes is critical for many cellular processes. Recently, analyses of genome-wide chromatin packing in the model dicot plant Arabidopsis thaliana have been reported. At a kilobase scale, the A. thaliana chromatin interaction network is highly correlated with a range of genomic and epigenomic features. Surprisingly, topologically associated domains (TADs), which appear to be a prevalent structural feature of genome packing in many animal species, are not prominent in the A. thalianagenome. Using a genome-wide chromatin conformation capture approach, Hi-C , we report high-resolution chromatin packing patterns of another model plant, rice. We unveil new structural features of chromatin organization at both chromosomal and local levels compared to A. thaliana, with thousands of distinct TADs that cover about a quarter of the rice genome. The rice TAD boundaries are associated with euchromatic epigenetic marks and active gene expression, and enriched with a sequence motif that can be recognized by plant-specific TCP proteins. In addition, we report chromosome decondensation in rice seedlings undergoing cold stress, despite local chromatin packing patterns remaining largely unchanged. The substantial variation found already in a comparison of two plant species suggests that chromatin organization in plants might be more diverse than in multicellular animals.
6.大规模平行单核RNA-seq与DroNc-seq
Massively parallel single-nucleus RNA-seq with DroNc-seq(Nature Methods)
Single-nucleus RNA sequencing (sNuc-seq) profiles RNA from tissues that are preserved or cannot be dissociated, but it does not provide high throughput. Here, we develop DroNc-seq: massively parallel sNuc-seq with droplet technology. We profile 39,111 nuclei from mouse and human archived brain samples to demonstrate sensitive, efficient, and unbiased classification of cell types, paving the way for systematic charting of cell atlases
7. 哺乳动物器官中复制基因表达的演变
The evolution of duplicate gene expression in mammalian organs(Genome research)
Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates.
8.选择性剪接谱和功能关联的图谱显示出新的调控机制和基因可以同时表达多种剪接体。
An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.(Genome research)
Alternative splicing (AS) generates remarkable regulatory and proteomic complexity in metazoans. However, the functions of most AS events are not known and programs of regulated splicing remain to be identified. To address these challenges, we describe the Vertebrate Alternative Splicing and Transcription Database (VastDB), the largest resource of genome-wide, quantitative profiles of AS events assembled to date. VastDB provides readily accessible quantitative information on the inclusion levels and functional associations of AS events detected in RNA-seq data from diverse vertebrate cell and tissue types, as well as developmental stages. The VastDB profiles reveal extensive new intergenic and intragenic regulatory relationships among different classes of AS, as well as previously unknown and conserved landscapes of tissue-regulated exons. Contrary to recent reports concluding that nearly all human genes express a single major isoform, VastDB provides evidence that at least 48% of multiexonic protein-coding genes express multiple splice variants that are highly regulated in a cell/tissue-specific manner, and that more than 18% of genes simultaneously express multiple major isoforms across diverse cell and tissue types. Isoforms encoded by the latter set of genes are generally co-expressed in the same cells and are often engaged by translating ribosomes. Moreover, they are encoded by genes that are significantly enriched in functions associated with transcriptional control, implying they may have an important and wide-ranging role in controlling cellular activities. VastDB thus provides an unprecedented resource for investigations of AS function and regulation.
9. HiCRep:使用层次调整的相关系数评估Hi-C数据的复现性。
HiCRep: assessing the reproducibility of Hi-C data using a stratum- adjusted correlation coefficient.(Genome research)
.Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.
欢迎关注生信人