The process of reading a gene and building the corresponding protein is called gene expression analysis. Gene expression analysis is the process by which a gene DNA sequencing is converted in to structure and function of cell. In this process the information encoded in a gene is used to direct the assembly of protein molecules.
- Gene expression analysis gives the cell control over structure and function and is the basic of cellular differentiation, morphogenesis, versatility, and adaptability of any organism.
- The proper gene expression analysis of a large no of gene is critical component of normal growth and development and maintenance of proper health.
- Any disruption in gene expression analysis is responsible for many diseases.
Process involve in Gene Expression Analysis
It is used by all known living Eukaryotes including multicellular organism, prokaryotes (Archebacteria) and viruses to generate macromolecular machinery for life.
Central Dogma
- Replication
- Transcription
- Translation
- Genetic information chemically determined by DNA structure is transferred to daughter cell by DNA replication and expressed by Transcription and followed by translation.
- DNA provides the original information from which proteins are made in a cell but DNA does not involve directly in proteins formation.
- RNA is the second type of nucleic acid which takes the information from DNA and makes proteins.
Gene expression produce protein by transcription and translation only in both the process only RNA is involve in gene expression.
Mainly two stages in gene expression analysis of a gene
- Transcription
The production of messenger RNA(mRNA) by the enzyme RNA polymerase and the processing of the resulting mRNA molecules.
- Translation –
The use of mRNA to direct protein synthesis and the subsequent post-translational processing of the protein molecules. Some genes are responsible for production of other forms of RNA that play a role in translation that includes transfer RNA (tRNA) and ribosomal RNA (rRNA).
A structural gene involve a number of different component
Exons
That code for amino acid and collectively determine the amino acid sequence of the protein product. This portion represent final mature mRNA molecules.
Introns
This is the portion of the gene that do not code for amino acids and are removed (spliced) from the mRNA molecule before translation.
Gene Control Regions
- Start sites – a start site for transcription.
- A promoter – A region of a few hundred nucleotides of the gene towards the 5’ end. It is not transcribed into mRNA but play a role in controlling the transcription of the gene. Transcription factor bind to specific nucleotide sequences in the promoter region and assist in the binding of RNA polymerase.
- Enhancers – some transcription factor called activators bind to regions called “enhancer” that increase the rate of transcription. These sites may be thousands of nucleotides from the coding sequences or with in intron. Some enhancers are uncertain and only work in the presence of other factors as well as transcription factors.
- Silencers – some transcription factor called repressors bind to regions called ‘silencers’ that depress the rate of transcription.
Transcription
Transcription is the process of RNA synthesis that is controlled by the interaction of promoter and enhancers. Several different type of RNA are produced including messenger RNA (mRNA), which specifies the sequences of amino acids in the protein product, transfer RNA (tRNA) and ribosomal RNA (rRNA), which play a role in translation process.
Transcription is carried out by protein called RNA polymerase. Transcription begin when RNA polymerase binds to the specific DNA sequences in the gene that is called as promoter. RNA polymerase then unwind and separate the two strands of double helix to expose the DNA bases on each strands.
Steps involves in transcription
- Initiation – the DNA molecule unwind and separate to form a small open complex. RNA polymerase binds to the promoter region of the template strand.
- Elongation – RNA polymerase moves along the template strand, synthesising mRNA molecule. In prokaryote, RNA polymerase is a holo-enzymes consisting of a number of subunit including a transcription factor that recognises the promoter. in eukaryote there are three RNA Polymerase Ⅰ,Ⅱand Ⅲ.
- Termination – in prokaryotes there are two way in which transcription is terminated. In Rho-dependent termination, a protein factor called “Rho” is responsible for disrupting the complex involving the template strand RNA Polymerase and RNA molecules. In Rho- independent termination a loop forms at the end of RNA molecules . termination is more complicated inninvolving the addition of adenine nucleotides at the 3’ of the RNA transcript called as polyadenylation. UAA,UAG and UGA are stop codon.
- Processing – after transcription the RNA molecule is processed in a number of way introns are removed and the exons are spliced together to form a mature RNA molecule consisting of a single protein coding sequencing. RNA synthesis involves the normal base pairing rules, but the base thymine(T) is replaced with base uracil(U).
Post Transcription Modification
Capping changes the 5’ end of mRNA to a 3’ end by 5’-5’ linkage, which protect the mRNA from 5’ exonuclease , which degrade foreign RNA. The cap also helps in ribosomal binding.
RNA editing is a process which results in sequence variation in RNA molecule and it is catalysed by enzymes.
Splicing remove the introns, noncoding regions that are transcribed into RNA in order to make mRNA able to create proteins. The two ends of the exons are then joined together.
Addition of poly(A) tail or polyadenylation this is the stretch of RNA where adenine bases is added to 3’ end and protect from 3’ exonuclease. In addition a long poly(A) tail can increase translation.
Three type of RNA which play a role in Gene Expression
mRNA – messenger RNA
- Produced when DNA is transcribed into RNA.
- It carries instruction for making a proteins from a gene and deliver the instruction to the sites of translation.
tRNA – transfer RNA
- At the sites of translation reads the instruction carried by mRNA then translate the mRNA sequence into protein subunit called as amino acids.
rRNA – ribosomal RNA
- This is a part of structure of ribosomes.
- Ribosome are the cellular structure where protein production occurs.
Translation
RNA –> PROTEIN
- Translation occur in sequence of step involve three kind of RNA and results in a complete polypeptide.
- Translation take place in cytoplasm where tRNA , mRNA and rRNA interact to assemble proteins.
- A specific amino acid is added to one end of each tRNA. The other hand of tRNA has an anticodon.
- An anticodon is a three nucleotide sequence on tRNA that is complementary to mRNA codon.
- mRNA joins with ribosome and tRNA.
- A tRNA molecule that has correct anticodon and amino acid binds to the second codon on mRNA.
- A peptide bond forms between two amino acid and first tRNA is released from ribosome.
- The ribosome then moves on one codon down the mRNA.
- The amino acid chain continues to grow as each new amino acid bind to the chain and previous tRNA is released.
- This process is repeated until one of three stop codon is reached. A stop codon does not have an anticodon, so protein production stops.
- Many copy of the same protein can be made rapidly from a single mRNA molecule because several ribosome can translate the same mRNA at the same time.
Steps involves in translation
Initiation
It involves the formation of ribosome, mRNA and initiator tRNA complex.The small subunit of the ribosome binds at 5’ end of mRNA molecule and move in a 3’ direction until it meets a start codon AUG. It then forms a complex with large unit of the ribosome complex and an initiation tRNA molecules.
Elongation
Subsequent codon on the mRNA molecule determine which tRNA molecule linked to an amino acid bind to mRNA. In elongation actual synthesis of the polypeptide chain, by formation of peptide bond between amino acids. An enzyme peptidyl transferase links the amino acids together using peptide bond.
Termination
It dissociate the translation complex and releases the finished polypeptide chain each of these steps require the activity of a specific set of protein factors in addition to the ribosome, tRNA and mRNA. Translation in terminated when the ribosomal complex reached one or more stop codon UAA,UAG,UGA.
Post Translation Modification
Many protein synthesized by translation are not functional. Many changes take place in the protein after synthesis which converts it into active protein. In the human body post translation modification increase the diversity and accuracy of proteins. Post translation modification of proteins which are not gene template based can regulate the protein function by causing changes in protein activity, their cellular locations and dynamic interaction with other proteins.
Why gene expression?
Gene expression is important because it determines the structure, function, and behaviour of cells and organisms. Gene expression is the process by which the information encoded in a gene is used to produce a functional product, such as a protein or a non-coding RNA. Gene expression can be regulated at different stages, such as transcription, splicing, translation, and post-translational modification. Gene expression can also respond to changes in the environment or the cell’s needs.
Some of the reasons why gene expression is important
- Gene expression allows for cellular differentiation and specialization.
Different types of cells express different sets of genes, which give them their unique characteristics and functions. For example, muscle cells express genes that encode for contractile proteins, while nerve cells express genes that encode for neurotransmitters and receptors. Gene expression is essential for the development and maintenance of multicellular organisms.
- Gene expression enables adaptation and evolution.
Organisms can adjust their gene expression levels in response to various stimuli, such as temperature, light, nutrients, hormones, or stress. This allows them to cope with changing conditions and survive in different environments. Gene expression can also lead to variations in phenotypes, which are the observable traits of an organism. These variations can be inherited or acquired, and they can affect the fitness and survival of an organism. Gene expression is a major source of biological diversity and evolution.
- Gene expression influences health and disease.
Abnormalities in gene expression can cause or contribute to various diseases and disorders, such as cancer, diabetes, Alzheimer’s, and autoimmune diseases. Gene expression can also affect the susceptibility and resistance of an organism to infections, toxins, drugs, or vaccines. Gene expression can be used as a diagnostic tool or a therapeutic target for many diseases. For example, gene expression profiling is a technique that measures the expression levels of thousands of genes simultaneously to identify patterns or signatures that are associated with specific diseases or outcomes.
These are some of the reasons why gene expression is important, but there may be more reasons that are not yet discovered or fully understood. Gene expression is a complex and dynamic process that allows cells to adapt to different conditions and perform various functions.
Role of Bioinformatics in Gene Expression
Bioinformatics is an interdisciplinary field that uses computational and analytical tools to handle, analyze, and interpret biological data. Bioinformatics plays a vital role in gene expression analysis, which is the study of how genes are turned on or off and how much they are expressed in different cells, tissues, or conditions. Gene expression analysis helps to understand the function and regulation of genes, the interactions between genes and proteins, and the molecular mechanisms of diseases and drug responses. Some of the ways bioinformatics contributes to gene expression analysis are:
- Sequencing and assembling genomes:
Bioinformatics allows the generation and processing of large-scale genomic data from various organisms, such as bacteria, plants, animals, and humans. Sequencing and assembling genomes provide the reference sequences for gene annotation and identification. Bioinformatics tools and software can help to align, map, assemble, and compare genomic sequences from different sources. For example, bioinformatics was essential for the success of the Human Genome Project, which sequenced and mapped the entire human genome in 2003.
- Annotating and predicting gene structure and function
Bioinformatics helps to annotate the genomic sequences with information about the location, structure, and function of genes and their products. Bioinformatics tools and software can help to identify genes, predict their coding regions, splice sites, promoters, enhancers, transcription factors, and other regulatory elements. Bioinformatics can also help to infer the function of genes based on their sequence similarity, domain composition, phylogenetic relationships, or expression patterns. For example, bioinformatics can help to assign Gene Ontology terms to genes based on their biological processes, molecular functions, or cellular components.
- Measuring and comparing gene expression levels
Bioinformatics enables the measurement and comparison of gene expression levels across different samples, conditions, or time points. Bioinformatics tools and software can help to process, normalize, analyze, and visualize gene expression data from various platforms, such as microarrays, RNA-seq, or single-cell sequencing. Bioinformatics can also help to identify differentially expressed genes, co-expressed genes, gene clusters, gene networks, or gene signatures that are associated with specific phenotypes or outcomes. For example, bioinformatics can help to discover biomarkers for diagnosis, prognosis, or treatment response in cancer or other diseases.
- Integrating gene expression data with other omics data
Bioinformatics facilitates the integration of gene expression data with other types of omics data, such as genomics (DNA sequence variations), epigenomics (DNA methylation or histone modifications), transcriptomics (alternative splicing or non-coding RNAs)
Bioinformatics is important in gene expression because it helps to analyze the large amount of data generated by high-throughput technologies, such as microarrays and RNA-Seq, that measure the expression levels of thousands of genes simultaneously. Bioinformatics also helps to understand the biological mechanisms and functions of gene expression, such as how genes are regulated by transcription factors, signaling pathways, and epigenetic modifications.
Bioinformatics can also help to discover new genes, identify biomarkers, and find novel therapeutic targets for various diseases based on gene expression patterns . Bioinformatics is a powerful tool that integrates computational and biological knowledge to study gene expression and its implications for health and disease.
Bioinformatics Technique used in Gene Expression
There are different bioinformatics techniques used in gene expression analysis, which is the study of how genes are turned on or off and how much they are expressed in a cell or tissue. Some of the bioinformatics techniques are:
- Pre-processing: This technique involves removing noise, artifacts, and biases from the raw data, such as background correction, base calling, trimming, filtering, and alignment. Some of the tools used for this technique are Power Tools (APT), FASTX-Toolkit, Trimmomatic, Bowtie2, and STAR .
- Normalization: This technique aims to reduce the unwanted variation between samples or batches that may affect the comparison of gene expression levels. Some of the methods used for this technique are quantile normalization, cyclic loess normalization, RPKM/FPKM/TPM normalization, and DESeq2 normalization.
- Quality control: This technique evaluates the quality and reliability of the data and identifies potential outliers or errors. Some of the tools used for this technique are affyQCReport, arrayQualityMetrics, FastQC, MultiQC, and RSeQC
- Statistical analysis: This technique tests the hypothesis of interest and identifies the genes that are differentially expressed between conditions or groups. Some of the methods used for this technique are t-test, ANOVA, limma, edgeR, DESeq2, and SAM.
- Biological interpretation: This technique explores the biological meaning and significance of the gene expression results and relates them to the biological context. Some of the methods used for this technique are gene ontology analysis, pathway analysis, network analysis, functional enrichment analysis, and gene set enrichment analysis.
These are some of the general bioinformatics techniques used in gene expression analysis. However, there may be variations or additional techniques depending on the specific data type, experimental design, and research question. Bioinformatics is a dynamic and evolving field that constantly develops new methods and tools to address the challenges and opportunities in gene expression analysis.
If you want to know more about Gene Expression Analysis you can join us for a 3 Hour Short Course on Genome Analysis, you can register yourself HERE
To get an idea about what is genome analysis before you join us for the short course you can read the article available HERE