1. 主要的陆地棉品种重测序分析确定了影响纤维质量和产量的基因组变异和位点。
Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield(Nature Genetics)
Abstract
Upland cotton is the most important natural-fiber crop. The genomic variation of diverse germplasms and alleles underpinning fiber quality and yield should be extensively explored. Here, we resequenced a core collection comprising 419 accessions with 6.55-fold coverage depth and identified approximately 3.66 million SNPs for evaluating the genomic variation. We performed phenotyping across 12 environments and conducted genome-wide association study of 13 fiber-related traits. 7,383 unique SNPs were significantly associated with these traits and were located within or near 4,820 genes; more associated loci were detected for fiber quality than fiber yield, and more fiber genes were detected in the D than the A subgenome. Several previously undescribed causal genes for days to flowering, fiber length, and fiber strength were identified. Phenotypic selection for these traits increased the frequency of elite alleles during domestication and breeding. These results provide targets for molecular selection and genetic manipulation in cotton improvement.
2. 根据更新的亚洲棉A基因组分析243个二倍体棉花种质重测序数据,确定了关键农艺性状的遗传基础
Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits
(Nature Genetics)
Abstract
The ancestors of Gossypium arboreum and Gossypium herbaceumprovided the A subgenome for the modern cultivated allotetraploid cotton. Here, we upgraded the G. arboreum genome assembly by integrating different technologies. We resequenced 243 G. arboreum and G. herbaceum accessions to generate a map of genome variations and found that they are equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. Most accessions with domestication-related traits experienced geographic isolation. Genome-wide association study (GWAS) identified 98 significant peak associations for 11 agronomically important traits in G. arboreum. A nonsynonymous substitution (cysteine-to-arginine substitution) of GaKASIII seems to confer substantial fatty acid composition (C16:0 and C16:1) changes in cotton seeds. Resistance to fusarium wilt disease is associated with activation of GaGSTF9 expression. Our work represents a major step toward understanding the evolution of the A genome of cotton.
3.全基因组关联分析确定了44种风险变异并加深了对重度抑郁症的遗传结构了解
Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression
(Nature Genetics)
Abstract
Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.
4. 欧洲血统个体的全基因组关联meta分析确定了新基因座,用于解释了头发颜色变异和遗传性
Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability(Nature Genetics)
Abstract
Hair color is one of the most recognizable visual traits in European populations and is under strong genetic control. Here we report the results of a genome-wide association study meta-analysis of almost 300,000 participants of European descent. We identified 123 autosomal and one X-chromosome loci significantly associated with hair color; all but 13 are novel. Collectively, single-nucleotide polymorphisms associated with hair color within these loci explain 34.6% of red hair, 24.8% of blond hair, and 26.1% of black hair heritability in the study populations. These results confirm the polygenic nature of complex phenotypes and improve our understanding of melanin pigment metabolism in humans.
5. 只有消化 - 连接的Hi-C是用于染色体构象捕获的高效且成本有效的方法
Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture
(Nature Genetics)
Abstract
Chromosome conformation capture (3C) technologies can be used to investigate 3D genomic structures. However, high background noise, high costs, and a lack of straightforward noise evaluation in current methods impede the advancement of 3D genomic research. Here we developed a simple digestion-ligation-only Hi-C (DLO Hi-C) technology to explore the 3D landscape of the genome. This method requires only two rounds of digestion and ligation, without the need for biotin labeling and pulldown. Non-ligated DNA was efficiently removed in a cost-effective step by purifying specific linker-ligated DNA fragments. Notably, random ligation could be quickly evaluated in an early quality-control step before sequencing. Moreover, an in situ version of DLO Hi-C using a four-cutter restriction enzyme has been developed. We applied DLO Hi-C to delineate the genomic architecture of THP-1 and K562 cells and uncovered chromosomal translocations. This technology may facilitate investigation of genomic organization, gene regulation, and (meta)genome assembly.
6. 高分子物理预测结构变异对染色质三维结构的影响
Polymer physics predicts the effects of structural variants on chromatin architecture
(Nature Genetics)
Abstract
Structural variants (SVs) can result in changes in gene expression due to abnormal chromatin folding and cause disease. However, the prediction of such effects remains a challenge. Here we present a polymer-physics-based approach (PRISMR) to model 3D chromatin folding and to predict enhancer–promoter contacts. PRISMR predicts higher-order chromatin structure from genome-wide chromosome conformation capture (Hi-C) data. Using the EPHA4 locus as a model, the effects of pathogenic SVs are predicted in silico and compared to Hi-C data generated from mouse limb buds and patient-derived fibroblasts. PRISMR deconvolves the folding complexity of the EPHA4 locus and identifies SV-induced ectopic contacts and alterations of 3D genome organization in homozygous or heterozygous states. We show that SVs can reconfigure topologically associating domains, thereby producing extensive rewiring of regulatory interactions and causing disease by gene misexpression. PRISMR can be used to predict interactions in silico, thereby providing a tool for analyzing the disease-causing potential of SVs.
7. 使用Chrom3D进行3D基因组建模
Computational 3D genome modeling using Chrom3D
(Nature Protocols)
Abstract
Chrom3D is a computational platform for 3D genome modeling that simulates the spatial positioning of chromosome domains relative to each other and relative to the nuclear periphery. In Chrom3D, chromosomes are modeled as chains of contiguous beads, in which each bead represents a genomic domain. In this protocol, a bead represents a topologically associated domain (TAD) mapped from ensemble Hi-C data. Chrom3D takes as input data significant pairwise TAD–TAD interactions determined from a Hi-C contact matrix, and TAD interactions with the nuclear periphery, determined by ChIP-sequencing of nuclear lamins to define lamina-associated domains (LADs). Chrom3D is based on Monte Carlo simulations initiated from a starting random bead configuration. During the optimization process, TAD–TAD interactions constrain bead positions relative to each other, whereas LAD information constrains the corresponding bead toward the nuclear periphery. Optimization can be repeated many times to generate an ensemble of 3D genome models. Analyses of the models enable estimations of the radial positioning of genomic sites in the nucleus across cells in a population. Chrom3D provides opportunities to reveal spatial relationships between TADs and LADs. More generally, predictions from Chrom3D models can be experimentally tested in the laboratory. We describe the entire Chrom3D protocol for modeling a 3D diploid human genome, from the creation of input files to the final rendering of 3D genome structures. The procedure takes ∼18 h. Chrom3D is freely available on GitHub.
8. 将荧光成像与Hi-C结合以研究相同单细胞的3D基因组体系结构
Combining fluorescence imaging with Hi-C to study 3D genome architecture of the same single cell
(Nature Protocols)
Abstract
Fluorescence imaging and chromosome conformation capture assays such as Hi-C are key tools for studying genome organization. However, traditionally, they have been carried out independently, making integration of the two types of data difficult to perform. By trapping individual cell nuclei inside a well of a 384-well glass-bottom plate with an agarose pad, we have established a protocol that allows both fluorescence imaging and Hi-C processing to be carried out on the same single cell. The protocol identifies 30,000–100,000 chromosome contacts per single haploid genome in parallel with fluorescence images. Contacts can be used to calculate intact genome structures to better than 100-kb resolution, which can then be directly compared with the images. Preparation of 20 single-cell Hi-C libraries using this protocol takes 5 d of bench work by researchers experienced in molecular biology techniques. Image acquisition and analysis require basic understanding of fluorescence microscopy, and some bioinformatics knowledge is required to run the sequence-processing tools described here.
9. 用动态和自动PGS软件生成群体基因组三维结构
Producing genome structure populations with the dynamic and automated PGS software
(Nature Protocols)
Abstract
Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin–chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.
10. 酵母长读长测序数据分析流程
Long-read sequencing data analysis for yeasts
(Nature Protocols)
Abstract
Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.
11. 使用单分子测序精确检测复杂的结构变异
Accurate detection of complex structural variations using single-molecule sequencing
(Nature Methods)
Abstract
Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR;https://github.com/philres/ngmlr
) and structural variant identification (Sniffles;
https://github.com/fritzsedlazeck/Sniffles) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.
12. Picky全面检测纳米孔测序长Read中的高分辨率结构变异
Picky comprehensively detects high-resolution structural variants in nanopore long reads
(Nature Methods)
Abstract
Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.