1)无引物偏好检索100万高质量,全长微生物16S和18S rRNA基因序列
Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias
Small subunit ribosomal RNA (SSU rRNA) genes, 16S in bacteria and 18S in eukaryotes, have been the standard phylogenetic markers used to characterize microbial diversity and evolution for decades. However, the reference databases of full-length SSU rRNA gene sequences are skewed to well-studied ecosystems and subject to primer bias and chimerism, which results in an incomplete view of the diversity present in a sample. We combine poly(A)-tailing and reverse transcription of SSU rRNA molecules with synthetic long-read sequencing to generate high-quality, full-length SSU rRNA sequences, without primer bias, at high throughput. We apply our approach to samples from seven different ecosystems and obtain more than a million SSU rRNA sequences from all domains of life, with an estimated raw error rate of 0.17%. We observe a large proportion of novel diversity, including several deeply branching phylum-level lineages putatively related to the Asgard Archaea. Our approach will enable expansion of the SSU rRNA reference databases by orders of magnitude, and contribute to a comprehensive census of the tree of life.
Highly parallel direct RNA sequencing on an array of nanopores
Sequencing the RNA in a biological sample can unlock a wealth of information, including the identity of bacteria and viruses, the nuances of alternative splicing or the transcriptional state of organisms. However, current methods have limitations due to short read lengths and reverse transcription or amplification biases. Here we demonstrate nanopore direct RNA-seq, a highly parallel, real-time, single-molecule method that circumvents reverse transcription or amplification steps. This method yields full-length, strand-specific RNA sequences and enables the direct detection of nucleotide analogs in RNA.
Detecting hierarchical genome folding with network modularity
Mammalian genomes are folded in a hierarchy of compartments, topologically associating domains (TADs), subTADs and looping interactions. Here, we describe 3DNetMod, a graph theory-based method for sensitive and accurate detection of chromatin domains across length scales in Hi-C data. We identify nested, partially overlapping TADs and subTADs genome wide by optimizing network modularity and varying a single resolution parameter. 3DNetMod can be applied broadly to understand genome reconfiguration in development and disease.
A general and flexible method for signal extraction from single-cell RNA-seq data
Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.
Genome-wide association study in 79,366 European-ancestry individuals informs the genetic architecture of 25-hydroxyvitamin D levels
Vitamin D is a steroid hormone precursor that is associated with a range of human traits and diseases. Previous GWAS of serum 25-hydroxyvitamin D concentrations have identified four genome-wide significant loci (GC, NADSYN1/DHCR7, CYP2R1, CYP24A1). In this study, we expand the previous SUNLIGHT Consortium GWAS discovery sample size from 16,125 to 79,366 (all European descent). This larger GWAS yields two additional loci harboring genome-wide significant variants (P = 4.7×10−9 at rs8018720 in SEC23A, and P = 1.9×10−14 at rs10745742 in AMDHD1). The overall estimate of heritability of 25-hydroxyvitamin D serum concentrations attributable to GWAS common SNPs is 7.5%, with statistically significant loci explaining 38% of this total. Further investigation identifies signal enrichment in immune and hematopoietic tissues, and clustering with autoimmune diseases in cell-type-specific analysis. Larger studies are required to identify additional common SNPs, and to explore the role of rare or structural variants and gene–gene interactions in the heritability of circulating 25-hydroxyvitamin D levels.
6)增强子相关长链非编码RNA LEENE调节内皮一氧化氮合酶和内皮功能
Enhancer-associated long non-coding RNA LEENE regulates endothelial nitric oxide synthase and endothelial function
The optimal expression of endothelial nitric oxide synthase (eNOS), the hallmark of endothelial homeostasis, is vital to vascular function. Dynamically regulated by various stimuli, eNOS expression is modulated at transcriptional, post-transcriptional, and post-translational levels. However, epigenetic modulations of eNOS, particularly through long non-coding RNAs (lncRNAs) and chromatin remodeling, remain to be explored. Here we identify an enhancer-associated lncRNA that enhances eNOS expression (LEENE). Combining RNA-sequencing and chromatin conformation capture methods, we demonstrate that LEENE is co-regulated with eNOS and that its enhancer resides in proximity toeNOS promoter in endothelial cells (ECs). Gain- and Loss-of-function of LEENE differentially regulate eNOS expression and EC function. Mechanistically, LEENE facilitates the recruitment of RNA Pol II to theeNOS promoter to enhance eNOS nascent RNA transcription. Our findings unravel a new layer in eNOS regulation and provide novel insights into cardiovascular regulation involving endothelial function.
High-resolution TADs reveal DNA sequences underlying genome organization in flies
Despite an abundance of new studies about topologically associating domains (TADs), the role of genetic information in TAD formation is still not fully understood. Here we use our software, HiCExplorer (hicexplorer.readthedocs.io) to annotate >2800 high-resolution (570 bp) TAD boundaries in Drosophila melanogaster. We identify eight DNA motifs enriched at boundaries, including a motif bound by the M1BP protein, and two new boundary motifs. In contrast to mammals, the CTCF motif is only enriched on a small fraction of boundaries flanking inactive chromatin while most active boundaries contain the motifs bound by the M1BP or Beaf-32 proteins. We demonstrate that boundaries can be accurately predicted using only the motif sequences at open chromatin sites. We propose that DNA sequence guides the genome architecture by allocation of boundary proteins in the genome. Finally, we present an interactive online database to access and explore the spatial organization of fly, mouse and human genomes, available athttp://chorogenome.ie-freiburg.mpg.de.
8)黑腹果蝇中的亚kb Hi-C揭示了昆虫和哺乳动物细胞之间TAD的保守特征
Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells
Topologically associating domains (TADs) are fundamental elements of the eukaryotic genomic structure. However, recent studies suggest that the insulating complexes, CTCF/cohesin, present at TAD borders in mammals are absent from those in Drosophila melanogaster, raising the possibility that border elements are not conserved among metazoans. Using in situ Hi-C with sub-kb resolution, here we show that the D.melanogaster genome is almost completely partitioned into >4000 TADs, nearly sevenfold more than previously identified. The overwhelming majority of these TADs are demarcated by the insulator complexes, BEAF-32/CP190, or BEAF-32/Chromator, indicating that these proteins may play an analogous role in flies as that of CTCF/cohesin in mammals. Moreover, extended regions previously thought to be unstructured are shown to consist of small contiguous TADs, a property also observed in mammals upon re-examination. Altogether, our work demonstrates that fundamental features associated with the higher-order folding of the genome are conserved from insects to mammals.
Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice
The rich genetic diversity in Oryza sativa and Oryza rufipogon serves as the main sources in rice breeding. Large-scale resequencing has been undertaken to discover allelic variants in rice, but much of the information for genetic variation is often lost by direct mapping of short sequence reads onto the O. sativa japonica Nipponbare reference genome. Here we constructed a pan-genome dataset of the O. sativa–O. rufipogon species complex through deep sequencing and de novo assembly of 66 divergent accessions. Intergenomic comparisons identified 23 million sequence variants in the rice genome. This catalog of sequence variations includes many known quantitative trait nucleotides and will be helpful in pinpointing new causal variants that underlie complex traits. In particular, we systemically investigated the whole set of coding genes using this pan-genome data, which revealed extensive presence and absence of variation among rice accessions. This pan-genome resource will further promote evolutionary and functional studies in rice.
Transposon-derived small RNAs triggered by miR845 mediate genome dosage response in Arabidopsis
Chromosome dosage has substantial effects on reproductive isolation and speciation in both plants and animals, but the underlying mechanisms are largely obscure1. Transposable elements in animals can regulate hybridity through maternal small RNA2, whereas small RNAs in plants have been postulated to regulate dosage response via neighboring imprinted genes3,4. Here we show that a highly conserved microRNA in plants, miR845, targets the tRNAMet primer-binding site (PBS) of long terminal repeat (LTR) retrotransposons in Arabidopsispollen, and triggers the accumulation of 21–22-nucleotide (nt) small RNAs in a dose-dependent fashion via RNA polymerase IV. We show that these epigenetically activated small interfering RNAs (easiRNAs) mediate hybridization barriers between diploid seed parents and tetraploid pollen parents (the ‘triploid block’), and that natural variation for miR845 may account for ‘endosperm balance’ allowing the formation of triploid seeds. Targeting of the PBS with small RNA is a common mechanism for transposon control in mammals and plants, and provides a uniquely sensitive means to monitor chromosome dosage and imprinting in the developing seed.
Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming
Chromosomal architecture is known to influence gene expression, yet its role in controlling cell fate remains poorly understood. Reprogramming of somatic cells into pluripotent stem cells (PSCs) by the transcription factors (TFs) OCT4, SOX2, KLF4 and MYC offers an opportunity to address this question but is severely limited by the low proportion of responding cells. We have recently developed a highly efficient reprogramming protocol that synchronously converts somatic into pluripotent stem cells. Here, we used this system to integrate time-resolved changes in genome topology with gene expression, TF binding and chromatin-state dynamics. The results showed that TFs drive topological genome reorganization at multiple architectural levels, often before changes in gene expression. Removal of locus-specific topological barriers can explain why pluripotency genes are activated sequentially, instead of simultaneously, during reprogramming. Together, our results implicate genome topology as an instructive force for implementing transcriptional programs and cell fate in mammals.
Reconstructing an African haploid genome from the 18th century