本期BMC前沿视角,我们有幸邀请到了Genome Biology 编委张勇课题组成员。他们为我们解读了最近发表在 Genome Biology 以及其他BMC 旗下开放获取期刊的六篇重要文献,以飨读者。
张勇课题组研究工作针对高通量生物学数据,结合生物信息学方法发展与深度数据分析,研究细胞命运决定过程中的表观遗传信息建立及预编程机制。主要研究成果包括:系统性地开发了表观遗传组数据分析方法及平台,阐释了脊椎动物早期胚胎发育中表观遗传动态变化规律。
更多信息可以参考:https://zhanglab.tongji.edu.cn
张勇教授目前是Genome Biology 期刊的编委会成员。他2013年获国家自然科学基金委优秀青年科学基金项目资助,2015年获万人计划青年拔尖人才项目,2018年获教育部青年长江学者。
#1. Title: Genome-wide analyses of chromatin interactions after the loss of Pol I, Pol II, and Pol III
评论:
染色质三维结构与转录调控之间的关系一直是科学界关注的热点问题。研究者普遍认为,远程增强子和靶基因空间上的直接相互作用可以调控基因转录,但转录对于染色质三维结构的影响则存在争议。过去的研究发现转录可以招募Cohesin从而调控染色质三维结构,而近期的一些研究却发现,在早期胚胎中抑制Pol II的转录并不会对TAD的结构造成影响。因此,需要更多的研究来阐明转录对染色质三维结构的调控作用。
在这项研究中,作者通过在小鼠胚胎干细胞中降解Pol I,Pol II和Pol III来研究RNA聚合酶介导的转录对染色质三维结构的影响。研究发现RNA聚合酶的降解对于较大尺度的A/B区室以及TAD结构的维持几乎没有影响,也不会影响分裂期后染色质三维结构的重建。然而,作者发现Pol II的丢失可以改变一些转录活跃的、较小尺度的染色质相互作用,说明Pol II可能调控转录调控元件附近的较小尺度的染色质环。为了探究Pol II调控染色质三维结构的机制,作者筛选了仅有Pol II结合而没有Cohesin和 CTCF结合的染色质环,发现这些染色质环在Pol II降解前后并没有明显的变化,表明Pol II并不能直接介导染色质三维结构的形成。但在长时间降解Pol II之后,染色质开放程度以及Cohesin的结合则会下降,表明Pol II的降解可能通过间接的作用影响染色质三维结构。这项工作分析了RNA聚合酶的降解对于染色质三维结构的影响,发现转录不是通过直接作用而是通过影响染色质开放程度和Cohesin的结合来间接调控染色质三维结构,在一定程度上解释了之前关于转录抑制对于染色质三维结构影响的争议,明确了Pol II转录对于染色质三维结构的调控作用。
The relationship between 3D chromatin organization and transcriptional regulation has always been a hotspot in the scientific community. It’s widely believed that the interaction between distal enhancer and target gene can regulate transcription of target gene, but the effect of transcription on 3D chromatin organization has been debated. Previous studies revealed that transcription can relocate cohesin to regulate 3D architecture, whereas some recent studies in early embryos found that the inhibition of Pol II transcription did not affect TAD structures. Therefore, more research is needed for determining the effect of transcription on 3D chromatin organization.
In this study, the authors investigated the effect of RNA polymerase-mediated transcriptional regulation on 3D chromatin organization by degrading Pol I, Pol II and Pol III in murine embryonic stem cells. Large-scale A / B compartments and TAD structures were rarely affected after the degradation of RNA polymerases and re-establishment of 3D architecture during mitotic exit was also unaffected. However, the researchers found that the loss of Pol II could alter some actively transcribed, short-range chromatin interactions, indicating Pol II might mediate small chromatin loops around gene regulatory elements. To investigate how Pol II regulates 3D architecture, the researchers filtered chromatin loops bound to Pol II but without cohesin or CTCF and found such chromatin loops were preserved after the depletion of Pol II, suggesting Pol II does not mediate 3D genome organization directly. However, chromatin accessibility and Cohesin occupancy decreased after prolonged depletion of Pol II, implicating that Pol II degradation may indirectly change 3D chromatin organization. This study analyzed the effect of RNA polymerase degradation on the 3D chromatin organization and showed that transcription mediates 3D architecture indirectly by affecting chromatin accessibility and Cohesin occupancy. It explained the controversy about the effect of transcription inhibition on the 3D chromatin organization and elucidated the regulatory role of Pol II transcription.
#2. Title: Single-cell ATAC-seq signal extraction and enhancement with SCATE
评论:
顺式调控元件的活跃程度与细胞的转录调控状态紧密关联。单细胞ATAC-seq技术可以在单细胞水平上捕获顺式调控元件的活跃程度,但此类数据的稀疏性和高噪音对数据分析带来了巨大的挑战。现有的分析方法主要通过合并相似的顺式调控元件或者合并相似的细胞来解决数据稀疏性问题,进而提高信噪比获得更可靠的结果。但是,这些方法会相应地降低对顺式转录元件间或者细胞间的差异进行检测的能力。该研究建立了名为SCATE的模型,从以下三个方面解决单细胞ATAC-seq数据处理中的问题:(1)综合考虑顺式调控元件和细胞的相似性,合并同类顺式调控元件和细胞,在提升信噪比的同时保持对顺式调控元件在细胞间和细胞内的特异性进行检测的能力。(2)充分利用公共的大量细胞染色质可及性数据,获取人和小鼠细胞内的顺式调控元件列表,并以这些顺式调控元件的活跃程度在不同细胞状态下的均值和方差作为基准,为单细胞的顺式调控元件活跃度的估计提供参考。(3)对顺式调控元件的聚类参数根据数据质量进行动态调整,在保证信噪比的同时尽可能多的区分不同类型的顺式调控元件。最后,作者使用了多套公共数据验证了该模型在细胞聚类以及顺式调控元件活跃程度检测上的合理性。该方法为单细胞ATAC-seq的分析提供了一个更为有效的工具。
The activity of cis-regulatory elements (CREs) is closely associated with transcriptional regulation of cells. Single-cell ATAC-seq (scATAC-seq) can capture the activity of genome-wide CREs in single cells, however, the sparseness and high noise level of scATAC-seq data constitute a major challenge for data analysis. Current analysis methods combine similar CREs or similar cells during data analysis to avoid data sparseness. This allows them to increase signal-to-noise ratio and obtain more reliable results. However, these methods will decrease the sensitivity of the analysis to detect the differences between various CREs or cells correspondingly. This study proposed a model called SCATE for scATAC-seq data analysis to solve the problems mentioned above by using the following three criteria. First, it considers similarities of CREs and cells and combines similar CREs and cells to increase the single-to-noise ratio while preserving the ability to detect the intracellular and intercellular specificity of CREs. Second, it obtains the list of human and mouse CREs by sufficiently utilizing massive publicly accessible ATAC-seq data. It also uses the average value and variance of CRE activity under various cellular state as a benchmark. Third, SCATE optimizes the clustering parameters of CREs in an adaptive manner based on data quality which allows it to detect as many types of CREs as possible while ensuring the signal-to-noise ratio. Finally, the authors validated the reliability of this model in cellular clustering and detection of CRE activity using multiple public data sets. The SCATE provides a more efficient tool for analyzing scATAC-seq data.
#3. Title: Genome-wide studies reveal the essential and opposite roles of ARID1A in controlling human cardiogenesis and neurogenesis from pluripotent stem cells
评论:
进化保守的ATP依赖的SWI/SNF复合物是最大的染色质重塑复合物之一。SWI/SNF复合物的经典功能是重塑染色质状态和转录因子的DNA可及性,因此对转录调控和基因表达至关重要。ARID1A是SWI/SNF复合物的DNA结合亚基,能够与BRM或BRG1组装形成功能性的染色质重塑复合物。以往的研究表明,异常的ARID1A活性可导致人类心脏和大脑的形成缺陷。但是ARID1A调控人类心脏发生和神经发生的分子机制仍然不清楚。在这项研究中,Liu及同事通过人胚胎干细胞(hESC)体外模型研究了ARID1A在早期人类心脏和神经发育中的作用。研究人员发现,在hESC向心脏祖细胞分化期间,ARID1A显著增加。即使在多能干细胞培养条件下,敲除hESC中的ARID1A(ARID1A-/-)也会导致自发性神经分化。scRNA-seq结果也揭示了ARID1A-/-和WT hESCs之间的细胞和转录异质性。进一步的分析发现,ARID1A的缺失显著激活了神经细胞的分化,而心脏细胞的分化则被明显抑制。通过scRNA-seq、ATAC-seq和ChIP-seq的整合分析,研究人员发现ARID1A的缺失降低了重要心源性基因的染色质可及性,而不影响神经源性基因的染色质可及性,但神经源性基因的转录依赖于ARID1A。Motif分析和Co-IP实验表明,ARID1A可能与关键的心源性转录因子T和MEF2C存在相互作用,以便于SWI/SNF复合物在基因上的沉积,从而有利于人类心脏的发育。另一方面,ARID1A可以与已知的神经抑制性沉默因子REST/NRSF相互作用,从而在神经发育过程中抑制了大多数重要神经源性基因的转录。总的来说,Liu及同事揭示了ARID1A以依赖于染色质重塑的方式促进心脏发育,以及不依赖于染色质重塑方式抑制神经发生的两种不同机制。
The evolutionarily conserved ATP-dependent SWI/SNF complex is one of the largest chromatin-remodeling complexes. The classical function of SWI/SNF complex is to remodel chromatin state and DNA accessibility to transcription factors, and hence is crucial for transcriptional regulation and gene expression. ARID1A, a DNA-binding subunit of the SWI/SNF complex, binds to either BRM or BRG1 to form a functional chromatin-remodeling complex. Previous studies indicated that abnormal ARID1A activity can lead to defects in both the heart and brain development in humans. However, the molecular mechanisms involved in ARID1A-mediated human cardiogenesis and neurogenesis remain elusive. In this study, Liu et al. investigated the role of ARID1A in early human cardiac and neural development using an in vitro human embryonic stem cell (hESC) model. They found that ARID1A expression significantly increased during differentiation of hESC to cardiac progenitors. Knockout of ARID1A in hESC (ARID1A-/-) led to spontaneous neural differentiation even under pluripotent stem cell culture conditions. Furthermore, scRNA-seq results revealed significant cellular heterogeneity and transcriptional differences in ARID1A-/- compared with WT hESCs. Further analysis found that loss of ARID1A strongly activated differentiation of neural cells, whereas differentiation of cardiac cells was significantly suppressed in hESCs. Integrated analysis of scRNA-seq, ATAC-seq and ChIP-seq showed that loss of ARID1A globally reduces chromatin accessibility on essential cardiogenic genes, but not on neurogenic genes; whereas transcription of essential neurogenic genes is dependent on ARID1A. Motif analysis and Co-IP indicated that ARID1A could temporally associate with key cardiogenic transcriptional factor T and MEF2C, which likely facilitate the sequential deposition of SWI/SNF complex on the genes, thereby favoring human heart development. On the other hand, ARID1A can interact with a known neural restrictive silencer factor REST/NRSF, resulting in repressed transcription of most essential neurogenic genes during neural development. Overall, Liu et al. revealed two different mechanisms for ARID1A control of early development, including promotion of cardiogenesis in a chromatin remodeling-dependent manner and repression of neurogenesis in a chromatin remodeling-independent manner.
#4. Title: Protection from DNA re-methylation by transcription factors in primordial germ cells and pre-implantation embryos can explain trans-generational epigenetic inheritance
评论:
DNA甲基化是表观遗传信息的重要载体,在哺乳动物早期胚胎和原始生殖细胞发育的过程中,发生全基因组的DNA去甲基化和重新甲基化。IAP型内源性逆转录病毒能够逃脱DNA去甲基化重编程,维持高度DNA甲基化状态,并参与跨代遗传的表型调控。然而,许多CpG在DNA甲基化重编程中一直保持未甲基化状态,表明存在额外的表观遗传信息载体来维持重编程过程中的CpGs甲基化状态。转录因子可以充当表观遗传信息的载体,通过与表观遗传学修饰的相互作用等方式促进DNA去甲基化和重新甲基化,但它们在DNA甲基化重编程中发挥的作用仍不明确。
本文使用已发表的全基因组重亚硫酸盐测序(BS-seq)、染色质可及性(DNase/ATAC-seq)和基因表达(RNA-seq)数据,研究原始生殖细胞和胚胎发育过程中DNA甲基化的变化及远端转录因子结合位点的动力学特征。研究人员发现,大部分CpGs的DNA甲基化状态在经历全基因组重编程后可以恢复。然而,那些具有高可及性和低DNA甲基化的CpGs位点在整个发育过程中往往受到转录因子的保护,一直处于低甲基化状态。此外,他们还确定了与DNA甲基化重编程相关的转录因子,并发现这些转录因子与原始生殖细胞早期或晚期的可及性有关。而IAPs对重编程相关的转录因子亲和力低,这可能有助于它们逃脱DNA甲基化重编程。总而言之,这项研究探讨了转录因子在原始生殖细胞和植入前胚胎发育过程中对DNA重新甲基化的抑制作用。
DNA methylation is an important carrier of epigenetic information. The genome is globally de-methylated and re-methylated in mammals at two developmental stages— early embryos and primordial germ cells (PGCs). Endogenous retroviral elements of IAP escapes from reprogramming of DNA methylation and maintains highly methylated status, and is involved in the phenotypic regulation of trans-generational inheritance. However, many CpGs remain unmethylated during DNA methylation reprogramming, indicating that there are additional epigenetic carriers to maintain the methylation status of CpGs during reprogramming. Transcription factors can act as carriers of epigenetic information and promote DNA de-methylation and re-methylation through interactions with epigenetic modifications, but its role in DNA methylation reprogramming still needs to be elucidated.
This study used published whole-genome bisulfite sequencing (BS-seq), chromatin accessibility (DNase/ATAC-seq) and gene expression (RNA-seq) data sets to study changes in DNA methylation and dynamics of distal transcription factor (TF) binding sites. The researchers found that the DNA methylation status of most CpGs can be restored after genome-wide DNA methylation reprogramming. However, those CpGs with high accessibility and low DNA methylation are often protected by transcription factors throughout the development, and maintains low methylated status. In addition, they also identified transcription factors related to DNA methylation reprogramming, and found that these transcription factors are associated with accessible sites in early or late PGC development. But IAPs have low affinity for transcription factors related to reprogramming, which may help them escape DNA methylation reprograming. In summary, this study explored the protective role of transcription factors from DNA re-methylation during pre-implantation embryo and PGC development.
#5. Title: Unraveling the molecular interactions involved in phase separation of glucocorticoid receptor
评论:
糖皮质激素受体(GR)是一种会受到配体激活的转录因子,能够在激素刺激下被转移到细胞核,并形成液态的凝聚物。虽然液-液相分离过程被认为是驱动许多无膜区室形成的机制,但是细胞核内GR凝聚物形成的机制和其功能相关性仍不清楚。
Storz等人剖析了与GR形成有关的分子相互作用,并分析了与这一过程相关的糖皮质受体结构决定因素。该研究主要利用U2OS细胞系开展实验,结果表明GR凝聚物具有对1,7-庚二醇敏感的特征,并利用荧光漂白恢复实验证明GR凝聚物中的分子能够与周围环境快速交换。GR凝聚物的另一个特征是相互分离的液滴可以融合在一起形成更大的液滴。研究还发现,GR凝聚物对渗透压的变化敏感,高渗培养基可以引发凝聚物进行快速且大规模的重组,并形成更多的点。这些点的形成需要活性GR的特殊相互作用,而不具有活性的GR在细胞质内对渗透压的变化不敏感。这些实验表明GR凝聚物的形成具有液-液相分离的特征。
该项工作通过对GFP-GR凝聚物的定位,发现凝聚物出现的位置可能与特定的核区室(如核仁)密切相关,或者在核空间内具有较少限制的区域形成。然而,在果蝇S2细胞的细胞核当中,被激素激活的GR无法正常形成凝聚体,说明GR凝聚体里面包含了一些果蝇细胞核中不存在的cofactor和染色质区域。GR凝聚物不是在随机位置形成的,而是需要与可能作为成核中心的某些染色质区域的特定相互作用。此外,研究还发现,活性GR分子凝聚物与中介体共定位,而中介体也是一种能够参与形成液态转录凝聚物的蛋白质。在与相分离相关的工作中,有一个较为普遍的推测便是蛋白质的内在无序区(IDR)在相分离所需的弱疏水相互作用中起着相关作用。然而,该研究的实验表明,含有IDR区域的GR的NTD结构域被敲除后并不会影响凝聚物的形成。相反,GR的LBD结构域对于凝聚物的形成非常重要。综上所述,该研究的实验结果表明GR凝聚物具有液体凝聚物的性质。该研究提出了一个模型,模型认为活性GR分子与染色质相互作用,并募集多价辅助因子,这些辅助因子与其他分子的相互作用导致凝聚物的形成,而在GR凝聚物中发生的相互作用的生物学相关性促进它们参与转录调控过程。
The glucocorticoid receptor (GR) is a ligand-activated transcription factor that can be translocated to the nucleus upon hormonal stimulation and form liquid condensates. While a liquid-liquid phase separation process has been considered as the mechanism to drive the formation of many non-membranous compartments, however, the specific mechanisms of the formation of GR foci in the nucleus and its functional relevance remain unknown.
Storz et al. dissected some of the molecular interactions involved in the formation of GR condensates and analyzed the GR structural determinants relevant to this process. Using U2OS cell line, they discovered that GR foci is sensitive to 1,7-heptanediol (1,7-HD). They demonstrated that there is fast exchange of molecules in GR foci and their surroundings in fluorescence recovery after photobleaching (FRAP) experiments. Another hallmark of GR foci is that the separate droplets can merge with other droplets to form larger ones. They found that GR foci respond to changes in osmolarity and hypertonic medium triggered a fast and massive reorganization of foci including the formation of more intensive dots. Theses dots require specific interactions involving the active GR while inactive GR cannot form foci in cell cytoplasm. These experiments suggest that GR foci present some physical characteristics of biomolecular condensates generated by liquid-liquid phase separation.
The authors localized GFP-GR foci and found that foci may form in close association with specific nuclear compartments, e.g., nucleoli, or at other less restricted regions of the nuclear space. However, the Dex-activated GR failed to form foci in the nucleus of Drosophila S2 cells, suggesting that GR condensates include certain cofactors and/or chromatin regions that are absent in Drosophila cells. GR foci do not form at random positions but rather require specific interactions with certain chromatin regions probably acting as nucleation centers. They also found that active GR molecules coexist with the mediator, a protein that is also involved in the formation of liquid condensates. It is commonly believed that intrinsically disordered regions (IDRs) of proteins play a relevant role in the weak hydrophobic interactions required for phase separation among the researchers of this field. In this study, however, the researchers found that deleting the NTD motif of GR with IRD did not affect the formation of the foci. On the contrary, the LBD motif of GR was crucial for formation of the foci. In summary, this study shows that GR foci have properties of liquid condensates. The researchers further proposed a model in which active GR molecules interact with chromatin and recruit multivalent cofactors whose interactions with additional molecules lead to the formation of a focus. Finally, the biological relevance of the interactions occurring in GR condensates supports their involvement in transcription regulation.
#6. Title: HiCoP, a simple and robust method for detecting interactions of regulatory regions
评论:
染色质的三维构象,特别是基因启动子与顺势调控元件等开放区域之间的相互作用,对于基因表达的调控及细胞特性的维持至关重要。目前,已经有一些方法被发展出来,如HiChIP,OCEAN-C,Trac-looping等,用以针对性地获取染色质开放区的相互作用信息。这些方法可以在一定程度上克服Hi-C对全基因组空间关系大量的无差别捕获而造成的调控区相关信息的淹没,但仍然存在一定的不足。此项研究中,Zhang及其同事基于OCEAN-C类似的原理,将分离柱纯化染色质(Column Purified chromatin, CoP)与Hi-C技术进行整合(HiCoP),获取了甲醛交联细胞中无核小体结合的DNA之间的相互作用图谱。研究人员发现,CoP-seq能够更为特异地富集基因的启动子区域,而借由对染色质的打断和重新连接,相应的远端调控元件也得以“共捕获”。因此相较于Hi-C,利用HiCoP技术可以获得更多的启动子-启动子、启动子-增强子及增强子元件之间的空间相互关系。通过对HiCoP与H3K27ac HiChIP数据所注释的染色质环进行比较,研究人员发现,HiCoP所捕获的,特别是高表达基因的启动子和增强子元件之间的相互作用,绝大多数都可以在HiChIP所注释的染色质环中看到。总而言之,在需要同时获取开放的染色质位置及其相应的空间构象信息时,该研究提供了一种简单有效的技术策略以供选择。
The 3D chromatin organization, especially the interaction between cis-regulatory elements and gene promoters,plays a vital role in regulation of gene expression and the maintenance of cellular identity. Currently, some techniques including HiChIP, OCEAN-C, Trac-looping, etc., are used for obtaining specific information about the interactions between open chromatin regions. Within Hi-C data, the important contacts between open chromatin and distal regulatory elements are often flooded by indiscriminate whole-genome interactions. To overcome this limitation, many methods, such as HiChIP, OCEAN-C, Trac-looping, etc. have been developed to enrich open chromatin interactome, but all of them have some deficiency. In this study, Zhang et al. developed a new method named as HiCoP with a similar principle of OCEAN-C, in which the Column Purified chromatin (CoP) assay was integrated with Hi-C technique, to profile the interactions among nucleosome-free DNA after sonicating the re-ligated formaldehyde-crosslinked chromatin. The CoP assay can efficiently identify active promoters and their distal regulatory elements can be co-captured after re-ligation. Thus, more promoter–enhancer, promoter–promoter, and enhancer–enhancer interactions were detected in HiCoP data compared with Hi-C data. The researchers found that most of the Promoter–Enhancer loops, especially those associated with highly expressed genes, identified by HiCoP are overlapped with the H3K27ac HiChIP by comparing the loops annotated from HiCoP data to those from HiChIP data of H3K27ac, respectively. In summary, this study provides a simple and robust alternative method for profiling accessible chromatin regions and corresponding spacial conformations simultaneously.
Genome Biology covers all areas of biology and biomedicine studied from a genomic and post-genomic perspective. Content includes research, new methods and software tools, and reviews, opinions and commentaries. Areas covered include, but are not limited to: sequence analysis; bioinformatics; insights into molecular, cellular and organismal biology; functional genomics; epigenomics; population genomics; proteomics; comparative biology and evolution; systems and network biology; genome editing and engineering; genomics of disease; and clinical genomics. All content is open access immediately on publication.