ISSN: 0970-938X (Print) | 0976-1683 (Electronic)

Biomedical Research

An International Journal of Medical Sciences

Review Article - Biomedical Research (2018) Volume 29, Issue 8

Next-generation sequencing technologies as emergent tools and their challenges in viral diagnostic

Desh Deepak Singh*

Amity institute of Biotechnology, Amity University, Rajasthan, India

*Corresponding Author:
Desh Deepak Singh
Amity institute of Biotechnology
Amity University, India

Accepted date: February 17, 2018

DOI: 10.4066/biomedicalresearch.29-18-362

Visit for more related articles at Biomedical Research

Abstract

Next-Generation Sequencing (NGS) has the range of high-throughput quickly adapted into the countless facet of viral diagnostic research. This method has been extensively applying in full genome sequencing, genetic diversity identification, transcriptomic routine diagnostic work patient care and management, and the new understanding of the interactions between viral and host transcriptome, to promote virus research. An exciting era of viral exploration has begun and will set us new challenges to understand the role of newly discovered viral diversity in both disease and health. In this review, discuss about the preparation of viral nucleic acid templates for sequencing, assay formats underlying next-generation sequencing systems, methods for imaging and base calling, quality control, data processing pipelines and bioinformatics approaches for sequence alignment, variant calling and suggestions for selecting suitable tools. Also discuss of the most important advances that the new sequencing technologies have brought to the fields of clinical virology and challenges behind it.

Keywords

Next generation sequencing, Virology, Bioinformatics, Transcriptome, Viral diversity, Sequencing depth.

Introduction

Many viruses show the high degree of genetic diversity continuously, due to a large number of evolutionary changes, short generation times, globalization, climate changes, large population sizes, and increased number of immunocompromised people [1]. The human is getting infected with a cumulative problem of viral diseases caused by the appearance of novel unidentified viruses and emergence of new infections [2]. Medically important immune immunodeficiency virus and hepatitis virus, Influenza like illness- causing viruses and severe acute respiratory illness causing viruses and high genetic heterogeneity causing virus in the patient [3]. Assessing of intra-host viral genetic diversity is really very important to know the evolutionary sequences of viruses, for manipulative real vaccines, and for the achievement of antiviral therapy [4,5].

Sometimes viral etiology cannot be characterized by traditional culture and molecular methods [6]. Scarce techniques, such as depictive modification investigation or arbitrary sequencing of plasmid libraries of nuclease resilient drivels of viral genomes [7], take lead in the previous to the discovery of several viruses, counting polyomavirus [8]. Human herpes virus type [9], Human GB [10] human parvovirus [11].

Need to advance approaches for the genetic characterization of suspected viral pathogens or new viruses [12]. So, NGS techniques signify an influential tool which can be useful to metagenomics founded approaches for the identification of novel pathogenic viruses in patients care [13]. In this review, I will discuss an overview of NGS challenge, opportunity in virology, specifically in virus discovery, genome sequencing and transcriptomics.

Next-generation Sequencing

NGS is capable of sequencing large numbers of different DNA sequences in a single reaction [14]. All Next-Generation Sequencing (NGS) technologies monitor the sequential addition of nucleotides to immobilized and spatially arrayed DNA templates, but differ substantially in how these templates are generated and how they are interrogated to reveal their sequences [15]. Current NGS platforms may show significant differences and their different characteristics are mentioned in (Table 1).

Machine Method Sequencing chemistry Read length (bp) Sequencing speed/h Maximum output per run Accuracy (%) Error rate* Main source of error
454 FLX Emulsion PCR Pyrosequencing 400-700 13 Mbp 700 Mbp 99.9 10-2-10-4 Intensity cutoff, homopolymers, amplification, mixed beads
Illumina (Illumina) Bridge PCR Reversible terminators 100-300 25 Mbp 600 Gbp 99.9 10-2-10-3 Homopolymers, phasing, nucleotide labeling, amplification, low coverage of AT rich regions
Solid (Life technologies) Emulsion PCR Ligation 75-85 21-28 Mbp 80-360 Gbp 99.9 10-2-10-3 Phasing, nucleotide labeling, signal degradation, mixed beads, low coverage of AT rich regions
Helicos (Helicos Biosciences) No amplification Single molecule Reversible terminators 25-55 83 Mbp 35 Gbp 97 10-2 Polymerase employed, molecule loss, low intensities
Ion torrent PGM Emulsion PCR Detection of released H 100-400 25 Mb-16 Gbp 100 Mb-64 Gbp 99 3 × 10-2 Homopolymers, amplification

Table 1: Review of current next-generation technologies.

Choice of platform

Important factors are included to choose of the machine, such as the size of the genome, G+C content, as well as the depth of coverage and accuracy required [16]. It is, therefore, most important to take advice from local service providers according to your requirement.

Methods

The data of NGS for viral diversity analysis depends significantly on the quality and preparation of the samples [17]. The high-quality of protocols castoff for genome extraction and removal of unwanted RNA and DNA from other sources like host cells depends on experiment handling [18]. To diminish experimental variations, individual step needs cautious attention from preparation to sequence data analysis (Figure 1).

biomedres-next-generation-sequencing

Figure 1: Workflow of next-generation sequencing.

Sample preparation

Nucleic acids may be extracted from whole blood, plasma, serum, throat swabs, cerebral spinal fluid, virus-infected supernatants, and other cell-free body fluids by using a nucleic acid extraction kit or manually by triazole methods using omitting RNase or DNase digestion according to the manufacturer’s recommended protocols [19,20]. RNA is bound to an advanced silica gel membrane under optimal buffering conditions [21]. A simple two-step washing protocol ensures that PCR inhibitors such as proteins or divalent cations are completely removed, leaving high-quality RNA to be eluted in Milli-Q water [22]. Nucleic acid extraction from viral samples brings with it the probable extra contest of low or variable viral titers [23]. Users are advised to low viral titers must find a method that is easy to use and that ensures high quantity for NGS [24].

Library preparation

The clonal amplification of each DNA fragment is performed by bridge amplification or emulsion PCR is necessary to generate sufficient copies of sequencing template [25]. The fragment libraries are obtained by annealing platform-specific adaptors to fragments generated from a DNA source of interest, such as genomic DNA, double-stranded cDNA, and PCR amplicons [26]. The presence of adapter sequences enables selective clonal amplification of the library molecules [27]. Therefore, no bacterial cloning step is required to amplify the template fragment in a bacterial intermediate, as is performed in traditional sequencing approaches; the adapter sequence also contains a docking site for the platform-specific sequencing primers [28]. The conventional method for DNA library preparation contains of 4 steps (I) Fragmentation of DNA (II) End overhaul of fragmented DNA (III) Ligation of adapter sequences (not for single-molecule sequencing applications) (IV) Optional library amplification [29]. At present different method are used to produce fragmented genomic DNA, such as sonication, nebulization, enzymatic digestion, and hydrodynamic shearing. Each method has specific compensation and limitations [30]. Enzymatic digestion is simple and rapid, but it is frequently tricky to precisely control the fragment length distribution [31]. In addition, this method is likely to carry in biases regarding the representation of genomic DNA [32]. The further three techniques make use of physical methods to start dual strand breaks into DNA, which are supposed to arise randomly resulting in an unbiased representation of the DNA in the library [33]. The subsequent DNA fragments size qualitative analysis can be done by agarose gel electrophoresis [34].

Following fragmentation, the DNA sections must be repaired to generate blunt-ended, 5'-phosphorylated DNA ends compatible with the sequencing platform-specific adapter ligation strategy [35]. The library generation efficiency is directly dependent on the efficiency and accuracy of these DNA end-repair steps [36].

The end-repair mix converts 5'- and 3'-overhanging ends to 5'- phosphorylated blunt-ended DNA [37]. In most cases, the end repair is proficient by manipulating the 5'-3' polymerase and the 3'-5' exonuclease activities of T4 DNA polymerase, although T4 Polynucleotide Kinase confirms the 5'- phoshorylation at blunt-end DNA fragments, preparing these fragments for subsequent adapter ligation [38]. The blunt-end DNA fragments can either directly be cast-off for adapterligation, or essential the adding of a single A overhang at the 3' ends of the DNA fragments to simplify succeeding ligation of platform-specific connecters through well-suited solitary T overhangs [39]. Classically, this A-addition stage is catalysed by Klenow Fragment (minus 3' to 5' exonuclease) or other polymerases with terminal transferase activity [40]. Ligase repaired library fragments, followed by reaction cleanup and DNA size selection to remove free library adapters [41]. The approaches for size selection of library comprise agrose gel separation, the usage of magnetic beads, or progressive column-based refinement methods [42]. Adapter-dimers that can occur during the ligation and will subsequently be coamplified with the adapter-ligated library fragments must be depleted from the libraries prior to sequencing, as they reduce the capacity of the sequencing platform for real library fragments and reduce sequencing quality [43]. Some sequencing platforms require a narrow distribution of library fragments for optimal results, which in many cases can only be achieved by excising the respective fragment section from the gel. This can also serve to deplete adapter-dimmers [44].

Fragment DNA libraries should be competent and quantified [45]. During library amplification stage, high-quality DNA polymerases are used to either produce the complete adapter sequence desirable for consequent clonal amplification and binding of sequencing primers with overlapping PCR primers, and/or to produce sophisticated vintages of the DNA libraries [46]. Optimal library amplification requires DNA polymerase with high fidelity and minimal sequence bias [47].

To enable efficient use of the sequencing capacity, sequencing libraries generated from different samples can be pooled and sequenced in the same sequencing run [48]. This is enabled by ligation DNA fragments to adaptors with characteristic barcodes, i.e., short stretches of nucleotide sequences that are distinct for each sample [49].

A complete sequencing library can subsequently be constructed by limited rounds of the PCR amplification of such tagged DNA fragments, limiting handling steps and saving time [50]. However, libraries generated using in vitro transposition may show higher sequence bias compared to those generated using conventional methods [51].

Library quality control for NGS

A high-quality library is a key to successful NGS [52]. Library construction includes complex steps, such as fragmenting the sample, repairing ends, adenylation of ends, ligation of adapters, and amplifying the library [53]. Monitoring of each step is needed to perform very carefully, including examination of the size of library after sample fragmentation and a size and concentration checked after ligation of adapters [54]. Library authentication helps by way of the final library excellence control step, which examines the library size and quantity [55].

Sequencing

To accomplish nucleic acid sequence from the amplified libraries, the library fragments act as a template [56]. The sequencing occurs from beginning to end a cycle of washing and flooding the fragments with the recognized nucleotides in a sequential order [57]. As nucleotides incorporate into the growing DNA strand, they are digitally recorded as sequence [58]. The PGM and the MiSeq each rely on a slightly different mechanism for detecting nucleotide sequence information [59]. The PGM performs semiconductor sequencing that relies on the detection of pH changes induced by the release of a hydrogen ion upon the incorporation of a nucleotide into a growing DNA strand [60]. Mi-sequence, is based on the detection of fluorescence produced by the incorporation of fluorescently labeled nucleotides into the growing strand of DNA [61,62].

Imaging

Throughout each run after NGS, more than a huge image data are generated and transformed to FASTQ format files [70]. Image examination usages raw images to trace clusters, export the positions and intensity and evaluate the noise for each cluster [71]. The base-calling stage recognizes the preparation of base reads from each cluster and filters undefined or lowquality reads [72]. In case of multiple samples identification, each sample analysed by its individual index sequences called bar-codes [73].

Bioinformatics

Once the data is mapped onto the reference genome by different processors, and then aggregated by a head computing node to provide the final mapped genomic sequences which are desired as shown in Figure 2 [74]. Mapping of the data depending on the size of the genome that needs to be mapped [75]. For reference mapping, we can use a pre-existing reference genome sequence [76]. Information produced from a bench-top sequencer can be effortlessly examined by means of 96-256 GB RAM to achieve the mapping of sequence data, for full- length genome size is steadily advanced about 3 GB versus the 40-60 MB and a few MB are obligatory in the case of a steady clinical gene panel [77]. Denovo sequencing is mandatory lengthier read lengths and larger analysis to perform the overlapping process resourcefully [78]. Denovo read mapping is a considerable additional computationally concentrated course associated to the reference mapping procedure [79]. Here is a huge network of bioinformatics software obtainable for the resolution of examination of DNA sequencing data [80]. At present-day, a collection of free online and commercial software is cast-off for the purpose of NGS data analysis as shown in Table 2 [81].

Category Tool Reference
miRNA prediction miRDeep/miRDeep2 [63]
Alignment/mapping FastQC [64]
 De novo assembly Velvet [65]
Quality Viewer qrqc [66]
Quality control FASTQ Quality Filter [67]
Adapter trimming cutadapt [68]
Alignment/mapping Bowtie/Bowtie2 [69]
Counting reads per transcript HTSeq [70]
Normalization, bias correction, and statistically testing differential expression DESeq [71]
Alignment/mapping MAQ [72]
SNV detection VarScan/VarScan2 [86]
Structural variation detection PEMer [87]

Table 2: Tools for next-generation sequencing data analysis.

biomedres-analytic-strategies

Figure 2: Steps for analytic strategies of DNA-seq and RNA-seq.

Read alignment

The orientation-created assemblage is employed to bring into line few hundred or thousand millions of alignment to an existing reference genome [82]. MAQ contrivance remains founded on the idea of a spread-out kernel indexing approach to map recites with orientation arrangement [83]. BFAST tool container usage for its speed and accuracy on mapping [84]. Novo align uses the Needle man-Wunsch procedure and affine gap consequences to the invention the universally optimal alignment. Burrows-Wheeler Aligner (BWA) short algorithm that queries short reads up to ~200 bp with a high sensitivity rate and queries long reads with a high error rate [85]. SOAP3, the latest version of SOAP supports Graphics Processing Unit (GPU)-based corresponding orientation and within 30 s for a one-million-read configuration onto the human orientation genome [86].

De novo assembly

The de novo methods mainly for grouping short reads into significant contigs and accumulating these contigs addicted to supports to rebuild the unique genomic DNA for novel species [87]. The crucial challenge of de novo assembly is that the read length is shorter than recurrences in the genome [88]. Velvet is practical by the postponement of valuable graph popularization to reduce the path difficulty [89].

After gathering the sequence, the next step in systematic cylinders is using a tool to call SNVs for documentation of hereditary alternatives. GATK breaks re-alignment supplements/removals (indels), achieves dishonorable excellence recalibration, noises genotypes, and differentiates factual separating difference by machine learning to regulate and genotype differences among manifold samples. SAM and VarScan/VarScan2tools computes genotype likelihood to call SNVs or short indels [90], employs experiential approaches and a numerical test to detect SNVs and indels. Somatic Sniper [91] and Joint SNVMix [92], use the genotype prospect typical of MAQ and two probabilistic graphical models, individually, to measure the possibility of the variances among genotypes [93].

NGS applications to virology

NGS applications to clinical virology briefly described here.

Virus discovery (metagenomics): Detection of novel viruses from clinically important samples in human and animal diseases, e.g. a new arenavirus tangled in resettle-related disease collections, [94] and the new Ebola virus Bundiubugyo [95], and the characterization of a viral etiology of an outbreak of a disease [96]. NGS is also useful in the environment [97], in animals [98,99], and in humans [100].

Whole viral genome reconstruction: Whole genome sequencing can be done by shotgun metagenomic sequencing of random libraries and shotgun sequencing of full genome amplicons, this type of method has been applied for sequencing of pandemic influenza virus [101] HIV [2], human herpes viruses [12], and other viruses, in most of the investigations of genetic diversity of intra-host virus variability has been observed.

Characterization of intra-host variability: NGS has been widely used for the characterization of intra-host variability of influenza virus [101], HCV [102], and HIV [2], Due to high replication ability of the replication [12], though a virus commonly administrates in the early stages of disease, throughout the following stages a large number of mutations emerges naturally [95], These changes are rapidly improved [98], The highly variable district of classifications inside an expected host, mentioned to as quasi-species, allows a viral population to rapidly adapt to dynamic environments and evolve resistance to vaccines and antiviral drugs [101], Most of these studies have been based on Ultra-Deep Sequencing (UDS) of PCR-generated amplicons spanning the genomic regions of interest [100].

Epidemiology of viral infections and viral evolution

NGS is existence used to examine the epidemiology of viral infections and viral progress, addressing issues such as viral super infection, which occurs when a previously infected individual acquires a new distinct strain [103], tracing the evolution and spread of viral strains, such as the emergence, evolution and worldwide spread of viruses, tracing the spread of viruses among persons [66], or showing the progress of viruses within the host and the machine of immune outflow, composed with replication suitability, such as in the case of HIV [2], and HCV infection [102].

Quality control of live-attenuated viral vaccine

Hereditary instability of RNA viruses may lead to the accumulation of infectious regress ants throughout the production of living viral vaccines, necessitating detailed excellence controller to confirm vaccine protection [5], NGS approaches have been expected as challenges for exhaustive maintenance inherent reliability of live viral vaccines.

Challenges

Various problems can require exploit since the scientific public in contradiction of practical contests to differentiate between de-novo transcript and post-transcriptional modifications [10].

The huge datasets formed by the diverse NGS technologies derived from their own challenges [13]. In addition, storing planetary, it will be significant to have correct arrangement purposes to be able to map cDNA reads onto the genome and to eliminate poor-quality nucleotide bases [9]. Here might else be problems with repetitive sequences and homopolymeric regions at the 5’ or 3’ ends of cDNA reads, which can confuse 5’ RACE and 3’ RACE experiments [10]. This is a common problem with 454 FLX sequencing as this is identified to absence precision at homopolymer regions [62]. Huge datasets will allow more accurate identification of transcript levels and associated strategies, but remain the risk of data overflow [78]. Finally, conception, examination and clarification will need substantial points of knowhow, and might similarly necessitate program design services [82]. Conception might be realized with the afore stated ARTEMIS [72] and combined genome Affymetrix, but then again LASERGENE (DNAstar) also offer modules optimized for RNA-seq analyses [81].

Conclusion

Emerging tools for NGS analysis of viral populations are required. As new sequencing technologies continue to emerge, other future bioinformatic challenges will include developing algorithms for aligning very long reads, and coping with the unique error profiles of each sequencing technology. Investigative virology is one of the greatest positive requests for NGS and thrilling consequences have been attained in the finding and classification of new viruses, detection of unexpected viral pathogens in viral sample, ultrasensitive intensive care of antiviral drug discovery, examination of viral multiplicity, development and transmission, and human virome. With the decrease of costs and development of turnaround period, these methods will possibly develop important analytical tools trendy clinical procedures samples. The possible application of these methods in documentation and confinement agendas can efficiently recover the competence and dependability of investigation authentication and in monitoring viral diseases at world level.

References