•
insert size
:
The length of the double-stranded nucleic acid fragment in a SMRTbell™ template, excluding the hairpin adapters.
•
MagBead
:
Small
paramagnetic bead, 2-3 μm in size. The DNA-polymerase complexes are
attached to the magnetic beads, which can then be pulled down for easy
removal of contaminants from the supernatant during the binding step.
The DNA-polymerase complex/bead mixture is then used for the
on-instrument immobilization step. See also MagBead loading.
•
Plasmidbell Complex (11 kb)
:
A fixed-length DNA template of 11 kb pre-bound to DNA polymerase.
•
primed template
:
Refers
to a template molecule that is annealed with primer; product of the
template prep protocol and an input to the binding protocol.
•
SMRT® Cells
:
Consumable
substrates comprising arrays of zero-mode waveguide nanostructures.
SMRT® Cells are used in conjunction with the DNA Sequencing Kit for
on-instrument DNA sequencing.
•
SMRTbell™ template
:
A
double-stranded DNA template capped by hairpin adapters (i.e.,
SMRTbell™ adapters) at both ends. A SMRTbell™ template is topologically
circular and structurally linear, and is the library format created by
the DNA Template Prep Kit.
•
template
:
A nucleic acid molecule to be sequenced; the DNA Template Prep Kit produces templates.
•
template annealing
:
Process of hybridizing primer(s) to nucleic acid templates.
•
template library
:
A set of nucleic acid molecules to be sequenced; the DNA Template Prep Kit process generates template libraries.
•
template-polymerase complex
:
Primed template bound to DNA polymerase; the output of the DNA/Polymerase Binding Kit process.
•
zero-mode waveguide (ZMW)
:
A
nanophotonic device for confining light to a small observation volume.
This can be, for example, a small hole in a conductive layer whose
diameter is too small to permit the propagation of light in the
wavelength range used for detection. Physically part of a SMRT® Cell.
Sample Preparation Terminology
•
AT ligation
:
The
library construction protocol option by which an adapter with a
single-nucleotide T overhang, is ligated to an insert with a
single-nucleotide A overhang. The workflow that uses this ligation
option also contains an A-tailing step.
•
barcode padding
:
An
optional 5 bp constant sequence appended to unique barcode sequences.
Can be used to normalize ligation of adapters during template
preparation.
•
barcoded adapter
:
A
SMRTbell™ adapter with a barcode sequence appended to the end of the
stem region. When using barcoded adapters, SMRTbell™ templates will have
a symmetric barcode structure.
•
barcoded SMRTbell™ template
:
A SMRTbell™ template with two barcoded adapters.
•
blunt ligation
:
The
library construction protocol option by which an adapter lacking any
overhangs is ligated to an insert also lacking any overhangs. The
workflow that uses this ligation option also lacks the A-tailing step.
•
diffusion loading
:
Immobilization
of DNA-polymerase complex into the ZMWs on the SMRT® Cell via
diffusion. Smaller inserts load preferentially compared to larger
inserts.
•
DNA damage repair
:
A
step in the SMRTbell™ library preparation that repairs a variety of
types of DNA damage, including pyrimidine dimers, abasics, and nicks.
•
DNA end repair
:
A step in the SMRTbell™ library preparation that removes 5’ and 3’ overhangs, and phosphorylates 5’ ends.
•
DNA fragmentation
:
The
generation of smaller DNA fragments. Multiple methods may be used to
fragment DNA, including hydrodynamic shearing, mechanical shearing,
sonication, and enzymatic digestion.
•
MagBead loading
:
Immobilization
of large DNA molecules into the ZMWs on the SMRT® Cell chip via
MagBeads. The smallest inserts, hairpin dimers, and excess polymerase
are washed out in the initial MagBead binding and washing steps. As a
result, medium and larger size inserts load better and have a higher
sequencing accuracy (compared to diffusion loading of similar- sized
inserts).
•
PacBio® SampleNet
(
http://www.smrtcommunity.com/SampleNet
)
:
Resource for information and discussion on sample preparation and sequencing with the PacBio® System.
•
polymerase binding
:
The binding of the sequencing polymerase to an appropriate binding site on a nucleic acid template.
•
primer annealing
:
The hybridization of a sequencing primer to an appropriate binding site on a template.
•
size selection
:
The
removal of unwanted fragments from a mixture based on size. This can
refer to the removal of only the shortest fragments, such as adapter
dimers, or to the isolation of a very narrow range of insert sizes.
Depending on the size range of interest and the equipment available,
size selection can be accomplished with AMPure PB beads, manual
isolation from an agarose gel, or automated gel isolation.
•
AHA
:
A
hybrid assembly algorithm that takes a draft assembly and joins contigs
using PacBio® reads as evidence. Part of the SMRT® Analysis suite.
•
Binding Calculator
:
Web-based application used to calculate binding and annealing reactions for preparing DNA samples for use on the PacBio® System.
•
BLASR
:
Used for targeted sequencing. Maps reads against a reference; part of SMRT® Analysis.
•
Celera® Assembler
:
Combines
Pacific Biosciences’ long reads with short reads generated by other
technologies. Used for de novo assembly. Third party software integrated
with the SMRT® Analysis suite.
•
GATK
:
Identifies haploid and diploid SNPs using the Broad’s Unified Genotyper software.
•
:
The
Hierarchical Genome Assembly Process (HGAP) can generate high quality
(≥ 99.999% accurate) de novo assemblies using a single PacBio® library
prep. HGAP includes pre-assembly, de novo assembly with Celera®
Assembler, and assembly polishing with Quiver.
•
PacBio DevNet
(
http://pacbiodevnet.com/
)
:
Resource
for informatics researchers, independent software vendors, and life
scientists; includes data sets, source code, application programming
interfaces, and documentation.
•
pacBioToCA
:
A
software module that aligns high-accuracy reads to the CLRs (continuous
long reads), error-corrects the CLRs when a minimum coverage is
satisfied, and splits or trims the CLRs otherwise. Third party software
integrated with the SMRT® Analysis suite.
•
PBJelly
:
A
gap-filling algorithm from Baylor University that takes an assembly
containing scaffolds and tries to fill the internal gaps. Not part of
the SMRT® Analysis suite.
•
Quiver
:
A
highly accurate consensus and variant caller that can generate 99.999%
accurate consensus sequences using local realignment and the full range
of quality scores associated with Pacific Biosciences reads. Part of the
SMRT® Analysis suite.
•
DBG2OLC
:
Efficient Assembly of Large Genomes Using the Compressed Overlap Graph
•
Racon
:
•
Canu
:
scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation
•
FALCON and FALCON-Unzip
:
algorithms
(https://github.com/PacificBiosciences/FALCON/) to assemble long-read
sequencing data into highly accurate, contiguous, and correctly phased
diploid genomes. We generate new reference sequences for heterozygous
samples including an F1 hybrid of Arabidopsis thaliana, the widely
cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus
Clavicorona pyxidata, samples that have challenged short-read assembly
approaches. The FALCON-based assemblies are substantially more
contiguous and complete than alternate short- or long-read approaches.
The phased diploid assembly enabled the study of haplotype structure and
heterozygosities between homologous chromosomes, including the
identification of widespread heterozygous structural variation within coding sequences.
•
HINGE
:
long-read
assembly achieves optimal repeat resolution.We present HINGE, an
assembler that seeks to achieve optimal repeat resolution by
distinguishing repeats that can be resolved given the data from those
that cannot. This is accomplished by adding "hinges" to reads for
constructing an overlap graph where only unresolvable repeats are merged
•
MECAT
:
fast
mapping, error correction, and de novo assembly for single-molecule
sequencing reads (accessible at https://github. com/xiaochuanle/MECAT)
for processing single-molecule sequencing (SMS) reads. MECAT’s computing
efficiency is superior to that of current tools, while the results
MECAT produces are comparable or improved. MECAT enables reference
mapping or de novo assembly of large genomes using SMS reads on a single
computer
•
RS Dashboard
:
Web-based software that displays quality metrics for individual instruments, runs, and SMRT® Cells.
•
RS Remote
:
Windows-based
client software used to design and monitor sequencing runs. Directs
user to look at primary analysis data in-depth using RS Dashboard.
•
RS Touch
:
Touchscreen
user interface on the instruments to help the user load the instrument
and start a run. Provides direct feedback on instrument status.
•
SMRT® Analysis Suite
:
Client/server software that performs automated and distributed analysis of sequencing data generated by the PacBio® System.
•
SMRT® Pipe
:
Command-line software used to launch secondary analysis jobs. Part of the SMRT® Analysis suite.
•
SMRT® Portal
:
Web-based software used to help set up secondary analysis jobs and view quality reports. Part of the SMRT® Analysis suite.
•
SMRT® View
:
Java-based genome browser used to visualize aligned or assembled reads. Part of the SMRT® Analysis suite.
•
collection time
:
The time specified for collecting data from a SMRT® Cell.
•
consensus accuracy
:
Accuracy based on aligning multiple sequencing reads or subreads together, optionally with a reference sequence.
•
high quality (HQ) region screening
:
Annotates the high quality sequencing regions of a read to be used during raw read trimming.
•
mo
vie
:
The
set of data collected during real-time observation of the SMRT® Cell;
including spectral information and temporal information used to
determine a read.
•
primary analysis
:
Includes signal processing of the movie, base calling of the traces and pulses, and quality assessment of the base calls.
•
pulse
:
The
representation of an illumination event derived from a trace that
includes metrics such as interpulse duration, pulse height, and pulse
width.
•
raw read trimming
:
Extraction of high quality regions from an unfiltered read. Trimming of an unfiltered read produces a polymerase read.
•
reads/SMRT® Cell
:
The average number of reads generated per SMRT® Cell.
•
SMRT® Sequencing
:
The process of nucleic acid sequencing using Pacific Biosciences’ single molecule, real-time sequencing technology.
•
standard sequencing
:
Sequencing
of SMRTbell™ templates to produce either single pass reads or circular
consensus reads, depending on the template length and collection time.
•
tertiary analysis
:
Statistical
analyses following secondary analysis, which includes comparisons of
secondary analysis results across different samples,
application-specific analyses, variant classification, and disease/gene
annotations.
•
trace
:
The raw intensity values from all four spectral channels of a single ZMW derived from a movie.
Secondary Analysis Terminology
•
analysis gr
oup
:
A group of reads from a single or multiple SMRT® Cells to be analyzed together in secondary analysis.
•
barcode FASTA
:
A
FASTA-format file used by barcoding software to identify ideal barcode
sequences. For symmetric barcodes, each barcode sequence identifies a
single bin for demultiplexing reads. For paired barcodes, each unique
pair of barcodes should be listed as two sequentially-named FASTA
sequences.
•
barcode score
:
The
alignment score between a read and an ideal barcode sequence. The
maximum barcode score is twice the length of the ideal barcode sequence.
•
circular consensus accuracy
:
Accuracy based on multiple sequencing passes around a single circular template molecule.
circular
consensus analysis: Processing of sequencing data generated by circular
consensus sequencing to create a circular consensus read.
•
circular consensus sequencing
(
CCS
)
:
Sequencing
performed on a circular template in which multiple subreads are
generated during multiple sequencing passes around the template, and
then collapsed to form a single high-accuracy read. CCS data are
generated when at least two full-pass subreads are present.
•
consensus sequence determination
:
Generation
of a consensus sequence from multiple individual reads of the same
template or identical copies thereof. Also termed “consensus calling.”
•
paired barcodes
:
Barcode
sequences that are different (asymmetric) on either end of an insert
present in a SMRTbell™ template. The barcoding analysis software uses
unique pairs of barcodes to separate and analyze reads.
•
QV Metric
:
"Phred"-like scores that predict, for each base call, the probability of a correct call.
•
secondary anal
ysis
:
Statistical analyses following primary analysis base calling that includes:
1) Filtering/selection of data that meets a desired criteria (such as quality, read length, and so on);
2)
comparison of reads to a reference for mapping and variant calling,
consensus sequence determination, alignment and assembly (de novo or
reference-based), variant identification and base modification
detection; and
3) quality evaluations for a sequencing run, consensus sequence, assembly, and so on.