fbpx

Genome Analysis Workflow | A Brief Introduction

Genome Analysis Workflow is mainly sequencing techniques in which expressed gene in a genomes.It is used to understand what may be causing symptoms or a disease. Aim of genome analysis workflows is to capture the protein coding regions of all the genes and read out of all the introns and exons in the genome. For genome analysis workflow, we need DNA molecules as a sample from any patients.

It is a genetic testing process to know the basic information of any genetical disease. Genetic testing has already been used for a long time in some health areas, such as cancer diagnosis and prenatal screening. With the help of this testing process if any type of changes occur in any cell of human bodies that comes out with this type of testing so any mutation tell the truth of cell is one technique used to test for genetic disorders.

Genome sequencing is mainly done in three steps

genome analysis workflow

1.  Laboratory preparation of DNA molecules

2.  Sequencing of genome 

3.  Data analysis 

Genome Analysis Workflow is a widely used next-generation sequencing (NGS) method that involves sequencing the protein-coding regions of the genome. We know that every single cell has a copy of DNA from parent to its offspring. A offspring or child born with mother and father chromosomes chromosome is made up of protein and DNA molecules. Each chromosome has 100 of gene and gene have two regions exons and introns.

Our body is made up of millions of cell and our body contain genetic material DNA and it divided into thousands of gene. Gene are made up of a sequence/nucleotide bases of letter in a particular order that is A (Adenine) ,T (thymine),G (guanine)  and C (cytosine). Any changes occur in gene and nucleotide bases confirms its mutations. Genetic testing sequencing looks for a change in a sequence or letter of the genes that could cause a disease some letters change are harmless while other affects the function of the gene and it could be cause the disease in parent or child.  

Each gene are divided in to exons and introns and exons are involved in making proteins most of the mutations occur in exons regions from this changes Autosomal dominant disorder and autosomal recessive disorder caused in specific gene .genome sequencing help out for this  type of disorder. For this type of disease doctors suggest the genome sequencing data analysis for any variations in the cells.

Genome sequencing is a simple steps in which Blood sample are taken with patients and further process came out with the help of next generation sequencing in a genetic report summarising the result and results can help in genetic information .Advantages of genome sequencing is Cost effectiveness, Fast Ultra throughput, Cloning free, Shorts read for any sample as compare to another method.

There has been increased demand for novel informatics and analytical strategies to compute vast sequencing data into high-quality calls with sufficient sensitivity and specificity for clinical application. 

Genome Analysis Workflow contains two main processes

1. Target-Enrichment

 2. Sequencing Target enrichment is used to select and capture genome from DNA samples. 

In this process there are few steps which is done in genome sequencing method:

  1. Select subset of DNA that encodes proteins.
  2. Sequencing of intron and exon region of DNA using DNA sequencing technology.

Laboratory preparation of genome analysis workflow is depends upon the sample and preparation have some steps to find out the cDNA library which is intron and  exon region of gene for next steps in next generation sequencing all the data stored in a  file for further sequencing and analysis through next generation sequencing. 

genome analysis workflow

Sequencing is the process to find out the arrangement of all the deoxyribonucleotides in a genome, which help us to understand the potential alternation in some diseases. Because of the decrease in the cost, the importance of genome sequencing is prominent. 

Because of cost and timing and length requirement of Sanger Sequencing, the sequencing technology did not contribute much in clinical and biological studies, until next generation sequencing technologies are invented. NGS technology is based upon the usage of dyed ddNTPs in Sanger methods. 

The improvement is that NGS allows DNA strands to be amplified, detect and merge at same time leading to increase in length and efficiency of sequencing. Target mapping process allows for the identification of coding nucleotide and splice site changes in the patient’s DNA that vary from the variants. 

To shorten principle of NGS is to bind the genome samples in a proper base such as magnetic beads of Roche-454, flowcell of Illumina Hiseq and  replicate them by PCR-in-situ  in order to make signal in each elongation amplified gene. after every round of elongation ddNTPs are detected. As a final point, the complete sequence is included using biological information algorithm. NGS largely improves the efficiency and allow higher throughput detection that is why NGS is also called high-throughput sequencing and it is widely used in genome sequencing technique.

Apart from NGS third generation of sequencing is developing quickly, which largely exceeds the efficiency of next generation sequencing. Important feature of third generation sequencing it is single molecule sequencing. It shortens the time and cost of genome sequencing to several minutes. Companies such Oxford Nanopore  and PacificBio have proved their works and third generation of sequencing technology may lead a revolution in genome sequencing fields. 

genome analysis workflow

Main applications of genome analysis is it shows rare variant mapping in a complex disorder. With this technology if any other type of information can be collected with the help of report and analysis clinical diagnostic of the disease may be possible. 

Additionally, sequencing platforms have systematic errors, which should be considered during data processing sufficient quality to undergo next-step data processing. In NGS process for genome sequencing method there are many platform which involve in sequencing process. Genome sequencing can be done in many platform some are like sanger sequencing and next generation sequencing. We are going through next generation sequencing because it has easy steps as compared to others and easy to handle the sample. Because in the NGS many type of Raw data files involve for better results of sample some of  FastQ , BAM/SAM , VCF etc.

These all are involved in better output Fastq tells the quality score SAM files are human-readable text files, and BAM files are simply their binary equivalent percentage of genome covered sufficiently is a better measure of quality than the average depth obtained after successful process of sequencing we can go for analysis on the basis of sequencing results and read out the data of the results on the basis of results based on alignment reads of the data we can construct the results of the sample.

Genome Analysis Workflow is the study of the structure, function, and evolution of genomes, which are the complete set of genetic material in an organism. Genome Analysis Workflow can help us understand the molecular basis of life, disease, and diversity. There are many types of genome analysis, such as sequencing, annotation, comparison, and editing. 

Role of Bioinformatics in Genome Analysis Workflow

Role of bioinformatics in genome analysis, which is the use of computational tools and methods to process, analyze, and interpret genomic data. It also discusses some of the challenges and opportunities in bioinformatics for genome analysis.

Advantages of Genome Sequencing

  • Identifies variant across a range of applications.
  • Achieves comprehensive coverage of coding regions
  • Provides a cost-effective difference
  • Produces a smaller, easy to manage data set for faster, data analysis easily.
  • Provides a high resolution, reads of the genome
  • Captures the both large and small variants that  missed with targeted approaches
  • Identifies potential  variant for further follow up studies of the  gene expression and regulating  mechanisms.
  • Delivers a large volume of data in a short period of time to support assembly of new genomes.

Applications

genome analysis workflow
  • Rare variant mapping in complex disorder
  • Discovery of biological or genetically disorder 
  • Clinical diagnostics

Benefits of Genome Sequencing using NGS

  • Cost effectiveness
  • Fast 
  • Ultra throughput (through screening )
  • Cloning free
  • Shorts read

Genome analysis has been used as a method of gene discovery in large series of patients with neurodevelopmental disabilities, Congenital heart disease, epilepsy, brain malformations and autism etc.

Steps and tools involve in data analysis in genome sequencing

StepsToolsDescriptions
RAW READ  
Quality analysisfastqcQuality checkup of raw sequence data
Trimming the Bad Quality ReadsTrimmomaticCut adopter and other illumina specific seq from the reads
Quality Analysis – Trimmed ReadsfastqcQuality check up of trimmed data
Mapping of Trimmed Raw Reads with Reference Genome MAP with BWAMapping include low divergent sequence against a large reference genome which is mm10, BWA design for illumine seq up to for 100 bp.
Removal of Unmapped Reads Filter SAM or BAM samtools flagstat  rmdupFilter a SAM or BAM file on mapping  quality .Print descriptive information for a BAM dataRemove potential pcr duplicates
Base alignment qualitiescalmdBase alignment qualities of data It output comes in BAM file
Variant Mapping Free bayesVariant analysis find small polymorphism, SNP(single nucleotide polymorphism),MNP(multi nucleotide polymorphism),insertion and deletions.
Variant of Interest SNPEFF download ,snpeff effAnnotation vcfAnalysis of data 

Tools used in Mapping with Reference Genome.

Map With BWA-MEM is a map medium and long reads (> 100 bp) against reference genome.

Variant call is a set of bioinformatics tools for analyzing high-throughput sequencing (HTS) and variant call format (VCF) data

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically multi-nucleotide polymorphisms, single-nucleotide polymorphisms, insertions and deletions and complex, smaller than the length of the short read sequencing alignment

SnpSift filter This tool provides a flexible solution for filtering the variants in a VCF input dataset through the use of arbitrary, possibly rather complex expression

SnpEff SnpEff relies on specially formatted databases to generate annotations. It will not work without them

Fastqc quality check-up of data 

Trimmomatic is used for Trimming the Bad Quality ReadsFilter SAM or BAM Removal of Unmapped Reads

What do we learn from Genome Sequencing?

  •  Using gene finding algorithm we can discover major portion of genes.
  • Understand the structure of a genome
  • Understand genome evolution 
  • Searching for genes associated with diseases

Conclusion

On the basis of bioinformatics in genome analysis, which is the use of computational tools and methods to process, analyse and interpret genomic data. With the help of bioinformatics tool and technique like next generation sequencing methods Genome sequencing helps us to find out basic information of any genetical disease. 

Genetic testing has already been used for a long time in some health areas, such as cancer diagnosis and prenatal screening and many more disease like Autosomal dominant disorder and autosomal recessive disorder this type of disorder doctors suggest the genome sequencing data analysis for any variations in the cells with the help of next generation sequencing in a genetic report summarising the result and results can help in genetic information after analysis of this genetic variation further diagnosis will be preferred by the doctors.

To know more about the Genome Analysis Workflow using a set of bioinformatics tools join us for a 3 Hour Short Course on Genome Analysis, you can register HERE

Read more about Genome Analysis in detail for better understanding of the technology HERE

Scroll to Top