专栏名称: VG生信软件

国内首家从事Windows平台、可视化生物信息学桌面软件开发的公司。致力于提供领先的生物信息学软件产品和系统服务。产品和业务包括：微生物多样性分析软件、转录组分析软件、重测序分析软件、细菌基因组分析系统。

三代测序英文名词解释大全

VG生信软件 · 公众号 · · 2017-09-25 16:12

正文

Consumables Terminology

• insert size :

The length of the double-stranded nucleic acid fragment in a SMRTbell™ template, excluding the hairpin adapters.

• MagBead :

Small paramagnetic bead, 2-3 μm in size. The DNA-polymerase complexes are attached to the magnetic beads, which can then be pulled down for easy removal of contaminants from the supernatant during the binding step. The DNA-polymerase complex/bead mixture is then used for the on-instrument immobilization step. See also MagBead loading.

• Plasmidbell Complex (11 kb) :

A fixed-length DNA template of 11 kb pre-bound to DNA polymerase.

• primed template :

Refers to a template molecule that is annealed with primer; product of the template prep protocol and an input to the binding protocol.

• SMRT® Cells :

Consumable substrates comprising arrays of zero-mode waveguide nanostructures. SMRT® Cells are used in conjunction with the DNA Sequencing Kit for on-instrument DNA sequencing.

• SMRTbell™ template :

A double-stranded DNA template capped by hairpin adapters (i.e., SMRTbell™ adapters) at both ends. A SMRTbell™ template is topologically circular and structurally linear, and is the library format created by the DNA Template Prep Kit.

• template :

A nucleic acid molecule to be sequenced; the DNA Template Prep Kit produces templates.

• template annealing :

Process of hybridizing primer(s) to nucleic acid templates.

• template library :

A set of nucleic acid molecules to be sequenced; the DNA Template Prep Kit process generates template libraries.

• template-polymerase complex :

Primed template bound to DNA polymerase; the output of the DNA/Polymerase Binding Kit process.

• zero-mode waveguide (ZMW) :

A nanophotonic device for confining light to a small observation volume. This can be, for example, a small hole in a conductive layer whose diameter is too small to permit the propagation of light in the wavelength range used for detection. Physically part of a SMRT® Cell.

Sample Preparation Terminology

• AT ligation :

The library construction protocol option by which an adapter with a single-nucleotide T overhang, is ligated to an insert with a single-nucleotide A overhang. The workflow that uses this ligation option also contains an A-tailing step.

• barcode padding :

An optional 5 bp constant sequence appended to unique barcode sequences. Can be used to normalize ligation of adapters during template preparation.

• barcoded adapter :

A SMRTbell™ adapter with a barcode sequence appended to the end of the stem region. When using barcoded adapters, SMRTbell™ templates will have a symmetric barcode structure.

• barcoded SMRTbell™ template :

A SMRTbell™ template with two barcoded adapters.

• blunt ligation :

The library construction protocol option by which an adapter lacking any overhangs is ligated to an insert also lacking any overhangs. The workflow that uses this ligation option also lacks the A-tailing step.

• diffusion loading :

Immobilization of DNA-polymerase complex into the ZMWs on the SMRT® Cell via diffusion. Smaller inserts load preferentially compared to larger inserts.

• DNA damage repair :

A step in the SMRTbell™ library preparation that repairs a variety of types of DNA damage, including pyrimidine dimers, abasics, and nicks.

• DNA end repair :

A step in the SMRTbell™ library preparation that removes 5’ and 3’ overhangs, and phosphorylates 5’ ends.

• DNA fragmentation :

The generation of smaller DNA fragments. Multiple methods may be used to fragment DNA, including hydrodynamic shearing, mechanical shearing, sonication, and enzymatic digestion.

• MagBead loading :

Immobilization of large DNA molecules into the ZMWs on the SMRT® Cell chip via MagBeads. The smallest inserts, hairpin dimers, and excess polymerase are washed out in the initial MagBead binding and washing steps. As a result, medium and larger size inserts load better and have a higher sequencing accuracy (compared to diffusion loading of similar- sized inserts).

• PacBio® SampleNet ( http://www.smrtcommunity.com/SampleNet ) :

Resource for information and discussion on sample preparation and sequencing with the PacBio® System.

• polymerase binding :

The binding of the sequencing polymerase to an appropriate binding site on a nucleic acid template.

• primer annealing :

The hybridization of a sequencing primer to an appropriate binding site on a template.

• size selection :

The removal of unwanted fragments from a mixture based on size. This can refer to the removal of only the shortest fragments, such as adapter dimers, or to the isolation of a very narrow range of insert sizes. Depending on the size range of interest and the equipment available, size selection can be accomplished with AMPure PB beads, manual isolation from an agarose gel, or automated gel isolation.

Software Terminology

• AHA :

A hybrid assembly algorithm that takes a draft assembly and joins contigs using PacBio® reads as evidence. Part of the SMRT® Analysis suite.

• Binding Calculator :

Web-based application used to calculate binding and annealing reactions for preparing DNA samples for use on the PacBio® System.

• BLASR :

Used for targeted sequencing. Maps reads against a reference; part of SMRT® Analysis.

• Celera® Assembler :

Combines Pacific Biosciences’ long reads with short reads generated by other technologies. Used for de novo assembly. Third party software integrated with the SMRT® Analysis suite.

• GATK :

Identifies haploid and diploid SNPs using the Broad’s Unified Genotyper software.

•

The Hierarchical Genome Assembly Process (HGAP) can generate high quality (≥ 99.999% accurate) de novo assemblies using a single PacBio® library prep. HGAP includes pre-assembly, de novo assembly with Celera® Assembler, and assembly polishing with Quiver.

• PacBio DevNet ( http://pacbiodevnet.com/ ) :

Resource for informatics researchers, independent software vendors, and life scientists; includes data sets, source code, application programming interfaces, and documentation.

• pacBioToCA :

A software module that aligns high-accuracy reads to the CLRs (continuous long reads), error-corrects the CLRs when a minimum coverage is satisfied, and splits or trims the CLRs otherwise. Third party software integrated with the SMRT® Analysis suite.

• PBJelly :

A gap-filling algorithm from Baylor University that takes an assembly containing scaffolds and tries to fill the internal gaps. Not part of the SMRT® Analysis suite.

• Quiver :

A highly accurate consensus and variant caller that can generate 99.999% accurate consensus sequences using local realignment and the full range of quality scores associated with Pacific Biosciences reads. Part of the SMRT® Analysis suite.

• DBG2OLC :

Efficient Assembly of Large Genomes Using the Compressed Overlap Graph

• Racon :

• Canu :

scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

• FALCON and FALCON-Unzip :

algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

• HINGE :

long-read assembly achieves optimal repeat resolution.We present HINGE, an assembler that seeks to achieve optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding "hinges" to reads for constructing an overlap graph where only unresolvable repeats are merged

• MECAT :

fast mapping, error correction, and de novo assembly for single-molecule sequencing reads (accessible at https://github. com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. MECAT’s computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer

• RS Dashboard :

Web-based software that displays quality metrics for individual instruments, runs, and SMRT® Cells.

• RS Remote :

Windows-based client software used to design and monitor sequencing runs. Directs user to look at primary analysis data in-depth using RS Dashboard.

• RS Touch :

Touchscreen user interface on the instruments to help the user load the instrument and start a run. Provides direct feedback on instrument status.

• SMRT® Analysis Suite :

Client/server software that performs automated and distributed analysis of sequencing data generated by the PacBio® System.

• SMRT® Pipe :

Command-line software used to launch secondary analysis jobs. Part of the SMRT® Analysis suite.

• SMRT® Portal :

Web-based software used to help set up secondary analysis jobs and view quality reports. Part of the SMRT® Analysis suite.

• SMRT® View :

Java-based genome browser used to visualize aligned or assembled reads. Part of the SMRT® Analysis suite.

Analysis Terminology

• collection time :

The time specified for collecting data from a SMRT® Cell.

• consensus accuracy :

Accuracy based on aligning multiple sequencing reads or subreads together, optionally with a reference sequence.

• high quality (HQ) region screening :

Annotates the high quality sequencing regions of a read to be used during raw read trimming.

• mo vie :

The set of data collected during real-time observation of the SMRT® Cell; including spectral information and temporal information used to determine a read.

• primary analysis :

Includes signal processing of the movie, base calling of the traces and pulses, and quality assessment of the base calls.

• pulse :

The representation of an illumination event derived from a trace that includes metrics such as interpulse duration, pulse height, and pulse width.

• raw read trimming :

Extraction of high quality regions from an unfiltered read. Trimming of an unfiltered read produces a polymerase read.

• reads/SMRT® Cell :

The average number of reads generated per SMRT® Cell.

• SMRT® Sequencing :

The process of nucleic acid sequencing using Pacific Biosciences’ single molecule, real-time sequencing technology.

• standard sequencing :

Sequencing of SMRTbell™ templates to produce either single pass reads or circular consensus reads, depending on the template length and collection time.

• tertiary analysis :

Statistical analyses following secondary analysis, which includes comparisons of secondary analysis results across different samples, application-specific analyses, variant classification, and disease/gene annotations.

• trace :

The raw intensity values from all four spectral channels of a single ZMW derived from a movie.

Secondary Analysis Terminology

• analysis gr oup :

A group of reads from a single or multiple SMRT® Cells to be analyzed together in secondary analysis.

• barcode FASTA :

A FASTA-format file used by barcoding software to identify ideal barcode sequences. For symmetric barcodes, each barcode sequence identifies a single bin for demultiplexing reads. For paired barcodes, each unique pair of barcodes should be listed as two sequentially-named FASTA sequences.

• barcode score :

The alignment score between a read and an ideal barcode sequence. The maximum barcode score is twice the length of the ideal barcode sequence.

• circular consensus accuracy :

Accuracy based on multiple sequencing passes around a single circular template molecule.

circular consensus analysis: Processing of sequencing data generated by circular consensus sequencing to create a circular consensus read.

• circular consensus sequencing ( CCS ) :

Sequencing performed on a circular template in which multiple subreads are generated during multiple sequencing passes around the template, and then collapsed to form a single high-accuracy read. CCS data are generated when at least two full-pass subreads are present.

• consensus sequence determination :

Generation of a consensus sequence from multiple individual reads of the same template or identical copies thereof. Also termed “consensus calling.”

• paired barcodes :

Barcode sequences that are different (asymmetric) on either end of an insert present in a SMRTbell™ template. The barcoding analysis software uses unique pairs of barcodes to separate and analyze reads.

• QV Metric :

"Phred"-like scores that predict, for each base call, the probability of a correct call.

• secondary anal ysis :

Statistical analyses following primary analysis base calling that includes:

1) Filtering/selection of data that meets a desired criteria (such as quality, read length, and so on);

2) comparison of reads to a reference for mapping and variant calling, consensus sequence determination, alignment and assembly (de novo or reference-based), variant identification and base modification detection; and

3) quality evaluations for a sequencing run, consensus sequence, assembly, and so on.

三代测序英文名词解释大全

正文

请到「今天看啥」查看全文