Lab Exercise 1
Introduction
The study and modification of plant genomes to produce better transgenic plants is of great interest to the plant science community. In order to modify plant genomes we must first quantify the genes that regulate their biological processes. Many plant genes are further classified into “gene families”, genes that express similar proteins that either share the same function, express a new function, or are regulated in different ways1. Molecular biologists often use polymerase chain reaction, or PCR, to amplify small amounts of DNA into large quantities suitable for study. During PCR, mixed double stranded DNA is heated and allowed to melt into single stranded DNA. Specially designed short strands of DNA corresponding to conserved regions of a particular gene or group of genes called primers are then added and allowed to anneal to their corresponding regions. Thermostable DNA polymerase uses this primer to synthesize the rest of the complementary sequence. Primer design is a crucial step in experimental design and is dependent on an assortment of factors, chief among which is the degeneracy of the genetic code.
It is known as part of the “wobble hypothesis” that the last base in a codon is variable such that multiple different codons can code for the same amino acid1. This means that if we know the exact sequence of DNA we wish to amplify with a primer we can only account for the physical properties of the primer that make it ideal for PCR. But what if we want to identify new members of a gene family? We also know that genes of the same gene family share some similarity with each other, and so by aligning the amino acid sequences using bioinformatics software we can identify conserved domains between all family members. By designing primers that take into account this similarity in amino acid sequences instead of nucleotide sequences, we can use the wobble hypothesis to our advantage to design non-specific primers. This can tell us additional information about which proposed primers might work best to identify novel genes.
Results
Conserved Region 1
Sequence ID
|
51-60aa
|
61-70aa
|
71-80aa
|
81-90aa
|
91-100aa
|
AAM08403.1
|
RWMNESITAL
|
LIGLGTGVVI
|
LLISRGKNS-
|
HLLVFSEDLF
|
FIYLLPPIIF
|
sp|Q68KI4.2|NHX1_ARATH
|
RWMNESITAL
|
LIGLGTGVTI
|
LLISKGKSS-
|
HLLVFSEDLF
|
FIYLLPPIIF
|
sp|Q84WG1.2|NHX3_ARATH
|
RWMNESITAL
|
IIGSCTGIVI
|
LLISGGKSS-
|
RILVFSEDLF
|
FIYLLPPIIF
|
AAM08405.1
|
RWVNESITAI
|
LVGAASGTVI
|
LLISKGKSS-
|
HILVFDEELF
|
FIYLLPPIIF
|
AAM08407.1
|
YYLPEASASL
|
LIGLIVGGLA
|
NISNTETSIR
|
TWFNFHDEFF
|
FLFLLPPIIF
|
AAM08406.1
|
HYLPEASGSL
|
LIGLIVGILA
|
NISDTETSIR
|
TWFNFHEEFF
|
FLFLLPPIIF
|
Conservation Output
|
_::_*:__::
|
::*___*___
|
_:_.__..__
|
__:_*_:::*
|
*::*******
|
Table 1a: Shown here is a subset of the Clustal 2.1 multiple sequence alignment displayed in an easy to read format with the region of interest highlighted.
Conserved Region 2
Sequence ID
|
151-160aa
|
161-170aa
|
171-180aa
|
181-190aa
|
191-200aa
|
AAM08403.1
|
LGDFLAIGAI
|
FAATDSVCTL
|
QVLNQD-ETP
|
LLYSLVFGEG
|
VVNDATSVVL
|
sp|Q68KI4.2|NHX1_ARATH
|
LGDYLAIGAI
|
FAATDSVCTL
|
QVLNQD-ETP
|
LLYSLVFGEG
|
VVNDATSVVV
|
sp|Q84WG1.2|NHX3_ARATH
|
IADYLAIGAI
|
FSATDSVCTL
|
QVLNQD-ETP
|
LLYSLVFGEG
|
VVNDATSVVL
|
AAM08405.1
|
ARDYLAIGTI
|
FSSTDTVCTL
|
QILHQD-ETP
|
LLYSLVFGEG
|
VVNDATSVVL
|
AAM08407.1
|
FVECLMFGSL
|
ISATDPVTVL
|
SIFQELGSDV
|
NLYALVFGES
|
VLNDAMAISL
|
AAM08406.1
|
FVECLMFGAL
|
ISATDPVTVL
|
SIFQDVGTDV
|
NLYALVFGES
|
VLNDAMAISL
|
Conservation Output
|
:_*_:*::_:
|
::**.*_.*_
|
.::::_____
|
_**:*****.
|
*:***_::_:
|
Table 1b: Shown here is a subset of the Clustal 2.1 multiple sequence alignment displayed in an easy to read format with the region of interest highlighted.
Conserved Region 3
Sequence ID
|
251-260aa
|
261-270aa
|
271-280aa
|
281-290aa
|
291-300aa
|
AAM08403.1
|
FGRHSTD-RE
|
VALMMLMAYL
|
SYMLAELFAL
|
SGILTVFFCG
|
IVMSHYTWHN
|
sp|Q68KI4.2|NHX1_ARATH
|
FGRHSTD-RE
|
VALMMLMAYL
|
SYMLAELFDL
|
SGILTVFFCG
|
IVMSHYTWHN
|
sp|Q84WG1.2|NHX3_ARATH
|
IGRHSTD-RE
|
VALMMLLAYL
|
SYMLAELFHL
|
SSILTVFFCG
|
IVMSHYTWHN
|
AAM08405.1
|
FGRHSTT-RE
|
LAIMVLMAYL
|
SYMLAELFSL
|
SGILTVFFCG
|
VLMSHYASYN
|
AAM08407.1
|
LDVDNLQNLE
|
CCLFVLFPYF
|
SYMLAEGLSL
|
SGIVSILFTG
|
IVMKHYTYSN
|
AAM08406.1
|
LDTENLQNLE
|
CCLFVLFPYF
|
SYMLAEGVGL
|
SGIVSILFTG
|
IVMKRYTFSN
|
Conservation Output
|
:._..____*
|
_.:::*:.*:
|
******_._*
|
*.*::::*_*
|
*::*.:*:_*
|
Table 1c: Shown here is a subset of the Clustal 2.1 multiple sequence alignment displayed in an easy to read format with the region of interest highlighted.
AAM08407.1 Primers
Primary
Oligo
|
Start
|
Length
|
Melting Temperature (C)
|
GC Percentage
|
Overall Self Complementarity
|
3’ Self Complementarity
|
Nucleotide Sequence
|
Left Primer
|
1177
|
20
|
59.98
|
45
|
5
|
0
|
ATGGCATTTGCTCTTGCTCT
|
Right Primer
|
1375
|
20
|
59.94
|
45
|
3
|
0
|
TGTTCACCACCTCAAATCCA
|
Table 4a: The primary forward and reverse primers computed for A. thaliana gene AAM08407.1 using the Primer3 analysis platform.
Secondary
Oligo
|
Start
|
Length
|
Melting Temperature (C)
|
GC Percentage
|
Overall Self Complementarity
|
3’ Self Complementarity
|
Nucleotide Sequence
|
Left Primer
|
1011
|
20
|
59.94
|
45
|
4
|
2
|
TTGGTCACACTTGGGATTCA
|
Right Primer
|
1211
|
20
|
60.14
|
50
|
4
|
2
|
TCGTGAACAGATTGCAGAGC
|
Table 4b: A secondary pair of forward and reverse primers computed for A. thaliana gene AAM08407.1 using the Primer3 analysis platform.
AAM08406.1 Primers
Primary
Oligo
|
Start
|
Length
|
Melting Temperature (C)
|
GC Percentage
|
Overall Self Complementarity
|
3’ Self Complementarity
|
Nucleotide Sequence
|
Left Primer
|
82
|
20
|
59.84
|
45
|
4
|
0
|
ATGATGCTCGTGCTTTCCTT
|
Right Primer
|
281
|
20
|
60.31
|
40
|
3
|
0
|
ATGATGGGAGGCAACAAAAA
|
Table 5a: The primary forward and reverse primers computed for A. thaliana gene AAM08406.1 using the Primer3 analysis platform.
Secondary
Oligo
|
Start
|
Length
|
Melting Temperature (C)
|
GC Percentage
|
Overall Self Complementarity
|
3’ Self Complementarity
|
Nucleotide Sequence
|
Left Primer
|
82
|
20
|
59.84
|
45
|
4
|
0
|
ATGATGCTCGTGCTTTCCTT
|
Right Primer
|
278
|
20
|
60.34
|
40
|
2
|
0
|
ATGGGAGGCAACAAAAACAA
|
Table 5b: A secondary pair of forward and reverse primers computed for A. thaliana gene AAM08407.1 using the Primer3 analysis platform.
Discussion
From the results of the Clustal 2.1 multiple sequence alignment I identified three conserved regions that may be suitable for degenerate primer design. Upon closer examination one will notice that genes AAM08407.1 and AAM08406.1 share more similarity with each other than with the other gene family members. This is made apparent by looking at the highly conserved regions denoted by a “:”; in most cases it is those two genes that are an exact match to each other, while the remaining genes tend to share the same, yet different, amino acid. According to the lab manual, these two genes are about 79% similar to each other, but only about 22% similar to the other genes, which share 56% similarity. Removal of these genes from the multiple sequence alignment would allow us to identify more regions of conservation, bringing the overall similarity from 22% to 56%.
Looking at Table 2 it is clear that conserved regions 2 and 3 are more ideal for degenerate primer design than conserved region 1, as this region has a degeneracy many times that of 2 and 3. Recall that degeneracy is the measure of how uncertain we are of the corresponding nucleotide sequence when given an amino acid sequence due to the wobble hypothesis. The greater the degeneracy, the less specific our primer will be and the greater the chance of it annealing and amplifying genes outside of the target gene family.
With this information we can better interpret the results of the j-CODEHOP analysis of our ClustalW multiple sequence alignment, summarized in Table 3. Many more primers were generated than conserved amino acid sequences because the algorithm first translates the sequence into a multitude of possible DNA sequences for reasons described earlier. In addition, the algorithm outputs a similar number of both forward and reverse primer examples. Examining the amino acid column of Table 3 we can see that predicted reverse primers VFGE-R, YMLA-R, and MLAE-R all correspond to conserved amino acid regions in the multiple sequence alignment results from Table 2. Primer VFGE-R corresponds to conserved region 2 and has a consensus clamp score of 73, while the other two primers correspond to conserved region 3 and have consensus clamp scores of 64 and 63 respectively. Recall that the higher the consensus clamp score, the higher sequence similarity across all members in the multiple sequence alignment. Since these three examples all share the same degeneracy score of 64, we can conclude that primers VFGE-R and YMLA-R might be the best candidates for novel gene isolation based on their consensus clamp scores and similarity to highly conserved sequence domains.
While we can predict possible degenerate primers using a given amino acid sequence from proteins within the same gene family, the more traditional approach is to use the nucleotide sequence of a gene of interest directly. Tables 4 and 5 summarize the output of the Primer3 platform when used to analyze the sequences of NHX genes AAM08407.1 and AAM08406.1. We can see that while the primary and secondary primers of AAM08407.1 vary greatly in their 3’ complementarity and nucleotide sequences, both primer pairs of AAM08406.1 are similar and vary only in their melting temperatures. While using individual nucleotide sequences to generate primer pairs eliminates the uncertainty of translating amino acid sequences to cDNA, this may exclude genes of the same gene family that do not contain much if any of the same sequences yet still code for functionally similar proteins. This is why using multiple different primer pairs for the same gene can be beneficial for isolation, as the more sites we can target the more genetic material we can recover during our experiment for further analysis.
References
1Experiment 1: Bioinformatics. (n.d.). BIT161B SQ2020. Retrieved April 14, 2020, from https://canvas.ucdavis.edu/courses/461005/files/folder/Laboratory%20Manual?preview=8296468
2 Multiple Sequence Alignment—CLUSTALW. (n.d.). Retrieved April 14, 2020, from https://www.genome.jp/tools-bin/clustalw
3 NHX and Arabidopsis—Protein—NCBI. (n.d.). Retrieved April 14, 2020, from https://www.ncbi.nlm.nih.gov/protein/?term=NHX%20and%20Arabidopsis&utm_source=gquery&utm_medium=search
4 Primer Design. (n.d.). Retrieved April 14, 2020, from http://bioweb.uwlax.edu/GenWeb/Molecular/seq_anal/primer_design/primer_design.htm
5 Protein to DNA reverse translation. (n.d.). Retrieved April 14, 2020, from http://www.biophp.org/minitools/protein_to_dna/demo.php
6 Shin-Lin Tu, Jeannette P. Staheli, Colum McClay, Kathleen McLeod, Timothy M. Rose and Chris Upton. 2018 Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes. Viruses 2018, 10(11), 637; https://doi.org/10.3390/v10110637.
7 Steve Rozen, Helen J. Skaletsky (1998) Primer3. Code available at http://www-genome.wi.mit.edu/genome_software/other/primer3.html.
8 Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research, 22(22), 4673–4680. https://doi.org/10.1093/nar/22.22.4673