Which codons are synonymous




















The lack of correlation with the original CAI data of Sharp and Li 33 who used 18 mostly ribosomal proteins supports this conclusion.

The tRNA concentration can respond to changes in cellular conditions 43 , Changes in tRNA concentration have also been linked to changes in gene expression 57 , 59 and viral virulence 60 , The lack of correlation between speeds assigned from tRNA concentration and all other measures is not surprising; this further supports the idea that our measures of codon usage are averaged over all phases of the cell cycle.

Results obtained using tRNA concentration, here only available for Ecoli , produced the most informative and robust picture of the effect translational speed has on protein structure. In the rest of this article we focus on these results. Domains are commonly thought of as structurally stable, individual folding units within the protein. The placement of domain boundaries is based on knowledge of protein structure and folding.

However, the particular amino acid assigned to define the domain boundary is in essence an arbitrary selection and varies even between well-regarded databases 62 such as SCOP, CATH 63 and Pfam Domain boundary definitions are also updated and change over time.

Here, structurally defined domain boundaries as given by SCOP release 1. Like Brunak and Engelbrecht 24 , we found no evidence that slow codons are clustered around domain boundaries in any of the three organisms in the study. This is true for all different length sections centred on the domain boundary. However, we observe some evidence that domain boundaries avoid slow codons and that they are enriched in fast codons Figure 2.

Domain boundaries are deficient in slow codons and enriched in fast codons. Solid lines represent fast codons and dotted lines slow codons. Enrichment Y -axis , the percentage increase over that found in the protein as a whole, is shown for different length sections X -axis centred on the domain boundary.

For tRNA concentration A only Ecoli data is available, with data displayed for all three codon speed windows considered in this study 3, 9 and In B data is displayed for Ecoli black and Yeast grey using a codon speed window of Recently, taking a small set of proteins, Zhang and coworkers 17 , 22 demonstrated that slow translating regions are found around domain boundaries. We are able to reproduce figures found in these publications and suggest that translational pausing at domain boundaries is not a general trait; rather that pauses may be incorporated in the nucleotide sequence where required for high-fidelity folding.

When we consider the region immediately downstream from N-terminal to the domain boundary our measures of codon translation speed provide different results. In general, the domain boundary is thought to be less structurally conserved than intra-domain loops and as such less codon selection is perhaps expected. The translation speed of a set of amino acids is examined by calculating their mean codon translation speed.

In general, the first 20 residues of a protein are translated more slowly than the last 20 residues. For Ecoli two-domain proteins, the mean translation speed of the first domains 4. As expected the first 20 residues of domain one are found to be translated more slowly than the first 20 residues in domain two. Thus, when considered with the finding that domain boundaries are enriched in fast codons, a consistent picture is built up of the first domain being translated more slowly than the second domain.

Previous research has shown that codon usage is non-random near the point of translation initiation, with an enrichment of non-preferred codons observed 65 , This may, at least partially, explain our results. Here the translation speed of secondary structure transitions is explored. The translation speed of each codon, s , is compared to the mean translation speed of the fragment.

Taking the logarithm of the relative speed produces results centred on zero. The mean translation speed may be sensitive to the number of codons taken to assign the fragment.

No significant differences are observed for the different fragment lengths and in all cases a clear decrease in translation speed on transition from coil to helix is observed Figure 3 A. Error bars based on SD or standard error are not appropriate given the non-normal distribution of speeds over each codon position. Thus, we perform both the Wilcoxon and Kolmogorov—Smirnov tests to assess significance Supplementary Data.

These tests indicate that the distribution of speeds for the transition codon is significantly different from those not adjacent to it. Given the small fragment sizes, a window of three codons is used to assign translation speed. Change in relative translation speed Y -axis on the transition between secondary structures X -axis. In moving from coil into helix A or coil into strand B a clear decrease in translation speed is observed.

The transition from helix into coil C or strand into coil D is also characterized by a decrease in translation speed. In this instance, around three residues downstream of N-terminal to the transition site.

This is followed by an increase in translation speed as the coil region is produced. Data shown for Ecoli using a speed window of three codons. They all show clear patterns in relative translation speed at the transition point Figure 3. In general, when starting production of a helix or strand there is a decrease in the translation speed that begins about three codons before the start of the helix.

Similarly, the translation speed decreases just before the helix or strand terminates Figure 3 C and D. Notably, on the transition from coil to strand, the translation speed increases immediately before the sharp decrease as strand production begins Figure 3 B. The data presented only includes transitions to and from coil, thus transitions directly from strand into helix are excluded. If we consider all transitions into helix, the same general pattern is observed, although the magnitude of the speed distributions is reduced slightly.

Generally, the results oppose those of Thanaraj and Argos 55 who found that slow codons have a higher propensity to encode strand and coils. Here, fast codons are a signature of starting production of a coil and slow codons a signal that a transition into helix or strand is imminent.

Current secondary structure prediction algorithms are highly accurate; however, predicting the actual termini of secondary structures is still relatively imprecise 67 , It may be that consideration of codon translation speed could improve secondary structure prediction, particularly in the region of secondary structure transitions. Our newly compiled CSandS database has allowed us to carry out a comprehensive study of mRNA coding sequence data and its relationship to protein structure.

In a number of cases, the mRNA is shown to be more informative about the protein structure than the amino acid sequence alone.

Results from CSandS can be split into two groups, those that are independent of codon translation speed and those that are not. Most previous studies have identified particular significant codons in a limited set of organisms 28—30 , Our speed independent measures indicate a more general trend that there is protein structural information contained in the mRNA nucleotide sequence that is not found in the protein primary sequence.

It is in these sets of synonymous codons that greater differences in translation speed are found. However, no direct link between significant codons and translation speed is elucidated.

From analysis of CSandS it is evident that the set of significant codons is not universal. It is hypothesized that the protein structural effects ascribed to mRNA result from changes in translation speed 10 , 23 , 34 , 51— Previously, linking these nucleotide-mediated features to translation speed has been difficult. Using tRNA concentration data produces more consistent results and the importance of tRNA concentration has also recently been demonstrated by the work of Romano et al.

They showed that folding phase transitions can be successfully modelled using tRNA concentrations. Opposed to many studies 10 , 17 , 22 but in agreement with Brunak and Engelbrecht 24 , we find that domain boundaries are not enriched with slow codons.

Our study indicates that domain boundaries are deficient in slow codons and show a small enrichment in fast codons. The sequence that connects two domains and contains the domain boundary is often thought to be less structurally constrained than intra-domain loops and this lack of constraint may be linked to its faster translation.

An increase in translation speed is also observed when terminating a secondary structure and starting coil production. For example, slow codons have a higher information content at the start of helices Supplementary Data and a relative decrease in translation speed is observed at the point of transition from coil to helix and coil to strand.

This work is one of the largest to date; however, there are still cases where the volume of data is not large enough to draw definite conclusions, e. Still, there is no contradiction between our results for domain boundaries and secondary structure transitions; i.

Most domain boundaries occur in coil regions that our results suggest are translated quickly. Moreover, we observe a relative decrease in translation speed rather than an increase in rare codons occurs at the transition into secondary structure. Zhou et al. Using Ooi number 71 as a measure of residue burial, we found that highly buried residues were on average translated more quickly Supplementary Data.

Codon optimality is not the only possible mechanism by which codon usage could affect the rate of translation. It was proposed that mRNA hairpins slowed down translation to increase the accuracy of folding. Examining codons observed to significantly increase the likelihood of amino acid burial in Ecoli proteins, we found that these codons destabilized local mRNA structures Supplementary Data.

Though preliminary work, it suggests that the structure of mRNA may be important in the formation of protein structure and is in keeping with the understanding that mRNA secondary structure must be unfolded to enter the ribosome Analysing the data contained in the CSandS database using tRNA concentration as a measure of codon translation speed has produced a consistent picture of how translation can affect local protein structures in Ecoli.

Information about protein structure beyond that of the amino acid sequence is contained in the mRNA-coding sequence. We demonstrate that this structural information is species specific and maybe linked to translation speed. N-terminal regions are generally translated slower than C-terminal regions and this could be related to co-translational folding. There is a clear decrease in translation speed at the start of secondary structures in Ecoli , a relationship that could be exploited in the accurate prediction of secondary structure termini.

Funding for open access charge: Oxford University. The authors would like to thank both Dr Simon Meyers and Professor Graham Wood for their help and advice on statistical testing. Also, Seb Kelm for multiple readings of this manuscript.

Google Scholar. Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account. Sign In. Advanced Search. Search Menu. Article Navigation. Close mobile search navigation Article Navigation. Volume Article Contents Abstract. Synonymous codon usage influences the local protein structure observed.

Oxford Academic. Charlotte M. Revision received:. Select Format Select format. Permissions Icon Permissions. Abstract Translation of mRNA into protein is a unidirectional information flow process. Table 1. Secondary structure classifications. Open in new tab. However, our simulated G-C exchanges resulted in lower minimum free energy compared to the original sequences for all species. This suggests that, for our dataset, selection for mRNA stability may only be contributing to a general preference for GC-ending codons not the specific preference for C-ending codons in mammalian rhodopsin.

However, overly stable mRNA structures may also be a disadvantage given they can interfere with other processes such as spliceosome activity and translation initiation [ ], and thus ultimately reduce translation speed. Selection for increased accuracy at conserved sites, increased translational speed, and for proper protein folding seem to take precedence over selection for mRNA stability in mammalian rhodopsin.

Several other studies have reported conflicts in codon choice under multiple selection pressures. For example, Carlini et al. Any increases in mRNA stability that arise from G-ending codon bias may thus partly be a by-product of mutational bias.

Resolved crystal structures will be necessary to confirm mRNA secondary structure in the future. Research in humans has indicated that synonymous mutations can cause disease by disrupting splicing sites or ESE regions [ ]; for review see [ 6 ]. Studies that examine the evolution of splicing-associated regions, especially exon-intron splicing junctions and ESEs, have provided much insight on the selective constraint associated with splicing.

An interspecies comparison of human, chimpanzee, and mouse orthologs also demonstrated that putative ESE regions showed significantly lower synonymous substitution rates than non-ESE regions [ 51 ].

Constraint on splicing enhancer regions in mammalian rhodopsins confirms another mechanism contributing to selection at synonymous sites. Given that our ESE analyses were limited to human and mouse, we suspect that a significant pattern may also become clearer with a larger species dataset. We found significant evidence for selection on synonymous sites in mammalian rhodopsin using phylogenetic likelihood models that explicitly differentiate between selection and mutational bias.

These models indicated that within codon families, C-ending codons had the highest relative fitness. Furthermore, C-ending codons are associated with conserved residues and abundant cognate tRNAs, which suggests selection for increased translational accuracy and speed.

Slightly elevated use of these codons in the helices over the loops, and slightly higher synonymous substitution rates in some loops, also suggest some influences from protein secondary structure. Our combined use of synonymous substitution models for detecting selection, and analytical approaches for detecting mechanistic effects on codon usage, demonstrate that post-transcriptional and translational processes are likely exerting selective constraint on the evolution of synonymous codons in mammalian rhodopsin.

We expect that other highly expressed transmembrane proteins, such as others in the GPCR family, should display similar selection signals on synonymous codons. Our results highlight the importance of focusing attention on highly expressed genes in a broader phylogenetic context in order to better understand post-transcriptional and translational processes driving the evolution of synonymous substitutions. Li WH, Wu CI, Luo CC: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes.

Mol Biol Evol. PubMed Google Scholar. CAS Google Scholar. Nucleic Acids Research. Duret L: Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. Nat Rev Genet. Francino MP, Ochman H: Deamination as the basis of strand-asymmetric evolution in transcribed Escherichia coli sequences.

Nat Genet. Ikemura T: Correlation between the abundance of Escherichia-coli transfer-RNAs and the occurrence of the respective codons in its protein genes - a proposal for a synonymous codon choice that is optimal for the Escherichia-coli translational system. J Mol Biol. Ikemura T: Correlation between the abundance of yeast transfer-RNAs and the occurrence of the respective codons in protein genes - differences in synonymous codon choice patterns of yeast and Escherichia-coli with reference to the abundance of isoaccepting transfer-RNAs.

Sharp PM, Li WH: The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. J Mol Evol. PloS Biology. Annual Review of Genetics. Google Scholar. Gingold H, Pilpel Y: Determinants of translation efficiency and accuracy. Mol Syst Biol. Eyre-Walker A: Evidence of selection on silent site base composition in mammals: Potential implications for the evolution of isochores and junk DNA. Iida K, Akashi H: A test of translational selection at 'silent' sites in the human genome: base composition comparisons in alternatively spliced genes.

Chamary JV, Hurst LD: Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: Evidence for selectively driven codon usage. Li WH: Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. Bulmer M: Strand symmetry of mutation-rates in the beta-globin region. Genet Res. Yang ZH, Nielsen R: Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage.

Nucleic Acids Res. FEBS Letters. Tao X, Dafu D: The relationship between synonymous codon usage and protein structure. Biochem Biophys Res Commun.

Zhang G, Hubalewska M, Ignatova Z: Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat Struct Mol Biol.

Protein Science. Comeron JM: Selective and mutational patterns associated with gene expression in humans: Influences on synonymous composition and intron presence.

Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Genome Biol. Blencowe BJ: Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci. Trends Genet. EMBO J. Nat Rev Neurosci. Prog Retin Eye Res.

Physiological Reviews. Biochimica Et Biophysica Acta. Vis Neurosci. Loytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Genome Research. J Mamm Evol. Fisher R: The distribution of gene ratios for rate mutations. Proc R Soc. Wright S: Evolution in Mendelian populations.

Kimura M: Some problems of stochastic-processes in genetics. Ann Math Stat. Akashi H: Synonymous codon usage in Drosophila melanogaster - natural-selection and translational accuracy.

J Biol Chem. Duret L: tRNA gene number and codon usage in the C-elegans genome are co-adapted for optimal translation of highly expressed genes. Trends in Genetics. Nat Biotechnol. Monatshefte Fur Chemie Chemical Monthly. MoL Cell Biol. Berget SM: Exon recognition in vertebrate splicing. Fu XD: The superfamily of arginine serine-rich splicing factors. Duret L, Mouchiroud D: Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis , Drosophila , Arabidopsis.

Genome Res. Biochem Moscow. Murphy FV, Ramakrishnan V: Structure of a purine-purine wobble base pair in the decoding center of the ribosome. Lavner Y, Kotlar D: Codon bias as a factor in regulating expression via translation rate in the human genome. Kotlar D, Lavner Y: The action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids. BMC Genomics. J Biomol Struc Dyn. Comput Struct Biotechnol J. Mol Biol Cell.

Sub-Cellular Biochem. Proc Nat Acad Sci. Curr Opin Struct Biol. J Theor Biol. Warnecke T, Hurst LD: Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Download references.

Thanks to Asher Cutter for helpful comments and edits during manuscript preparation. You can also search for this author in PubMed Google Scholar. Correspondence to Belinda SW Chang. JD compiled the dataset, performed the initial analyses, constructed the figures and tables, and helped to draft the manuscript.

SZD drafted the manuscript. AS contributed to design and implementation of statistical tests and helped to draft the manuscript. BSWC guided all aspects of the study, and helped to draft the manuscript. All authors read and approved the final manuscript.

Additional file 1: Accession numbers of resource records for all rhodopsin sequences downloaded from NCBI. Table A2. Nucleotide contents of four-fold degenerate codons and introns in mammalian rhodopsin genes. In contrast, protein with positive Gravy is considered as hydrophobic and is water soluble [ 30 ]. The quantities of individual nucleotide A, T, G, and C were determined and used to sum up the AT and GC content for each protein in the albumin superfamily. Rare codon RC is considered as low-usage codon in the genome such as synonymous codon or stop codon [ 31 ].

Indices of codon usage deviation were calculated using CodonW J Peden, version 1. Based on that, two internal measures were applied including identification of GC variation and third nucleotide preference in codon [ 33 , 34 ]. These were obtained by calculating the number of GC nucleotides and number of G or C nucleotides at the third position of synonymous codon GC 3 , except the start and termination codons.

In addition, the expected effective number of codons ENC for each albumin superfamily protein was calculated. ENC is the measure of codon usage affected only by the GC 3 as a consequence of mutation pressure and genetic drift. Relative synonymous codon usage RSCU was calculated in order to examine the frequency of each synonymous codon that encoded the same amino acid without confounding effect on the composition of amino acid. The index was calculated as follows [ 36 ]: where is the amount of th codon to represent the th amino acid that can be encoded by synonymous codons.

Genomic information of mRNA sequences of the four members of human albumin superfamilyis shown in Table 1. Only the ORF with no intermediate stop codon was selected for codon usage analysis. The similarity of nucleotide and amino acid sequences of the albumin superfamily members is summarized in Figure 1.

The solubility of protein for the members of the albumin superfamily was assessed through Gravy score Table 1. All the family members are found to have negative Gravy score, suggesting that these proteins are water soluble. This is in accordancewith the biological role of these proteins as serum transporters.

The nucleotide distribution of albumin superfamily is shown in Table 2. ALB and AFP shows similar nucleotide distribution pattern implying that they share similarity in their structures and biological functions. There is a close relationship between the nucleotide composition and gene function [ 37 ].

Although AFM and VDBP are grouped in the same superfamily, they show differential nucleotide composition suggesting variation in their biological functions compared to the other members of albumin superfamily. Rare codon analysis was carried out using the GenScript web server as described in Materials and Methods.

A graph of codon frequency distribution was plotted to identify the quantities of rare codons present in each albumin superfamily protein Figure 2.

Frequency of codon usage with a value of indicates that the codons are highly used for a given amino acid. Conversely, the frequency of codon usage with a value of less than 30 is determined as low-frequency codon, which is likely to affect the expression efficiency.

This result suggested that members of the albumin superfamily contain a significantly small number of rare codons that may reduce translational efficiency of the genes. Indices of codon usage deviation are used to determine the differences between the observed and expected codon usage.

The results for the effective number of codon ENC , GC content, and G or C nucleotides at the third position of synonymous codon are summarized in Table 1.

The effective number of codons ENC for each member of human albumin superfamily was calculated in order to examine the pattern of synonymous codon usage independent of the gene length. The ENC value ranges from 20 to 61, in which value of 20 indicates extreme bias toward the usage of one codon, while value of 61 represents equal usage of the synonymous codons [ 35 , 38 ].

Result from this analysis revealed that the ENC value of albumin superfamily varies from The overall ENC value of albumin superfamily is greater than The high ENC value suggested that the synonymous codons of albumin superfamily were equally used and hence displayed less biased synonymous codon usage. The GC content of albumin superfamily is given in Table 1. GC content can be related to the ability of coding region to be in an open chromatin state, leading to active transcription [ 39 ].

It is evident that all the members of albumin superfamily genes have low GC content, indicating that these family members are highly expressed. Furthermore, it has been reported that highly transcribed genes may have low mutation rates because they are subjected to DNA repair [ 40 ]. However, within the albumin superfamily, VDBP contains the highest GC content indicating that it has the lowest expressivity level.

GC content at the third position of codons GC 3 is a putative indicator of the extent of base composition bias. Table 1 revealed that the albumin superfamily has low GC 3 values ranging from The albumin superfamily has low GC 3 value because the majority of genes in this superfamily are located in AT-rich region. Genes in AT-rich regions within the genome would prefer to use A or T ending codon. The low usage of codons ending with G or C signifies less GC codon usage bias in albumin superfamily.

In other words, it proved the homogeneity of synonymous codon usage pattern in albumin superfamily. The synonymous codon bias usage of each albumin superfamily protein was computed and tabulated in Table 3. The most preferentially used codon for a given amino acid is highlighted in red.

Preferential codon usage in albumin superfamily indicates that the codons with A or U at the third position are more preferred compared to G or C ending codons. It is because some amino acid residues are encoded in equal frequencies by both A or U and G or C ending codons and hence are excluded from the analysis.

The proteins possess three homologous folding domains as a result of conserved pattern of cysteine residues in the members of albumin superfamily [ 41 , 42 ]. Our study on codon usage bias in the members of the albumin gene family revealed that they are also similar in terms of their low GC content, low GC 3 , and high ENC values.

In addition, they are not having a bias in the usage of synonymous codons and are highly expressible genes. Furthermore, low GC and GC 3 values revealed that mutational bias and translational selection do not play a significant role in shaping the codon usage pattern in the albumin superfamily. The authors declare that there is no conflict of interests regarding the publication of this paper.

The authors would like to thank Director General of Ministry of Health, Malaysia, for granting permission to publish this paper. This is an open access article distributed under the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Article of the Year Award: Outstanding research contributions of , as selected by our Chief Editors.

Read the winning articles. Journal overview. Special Issues. Academic Editor: S. Received 25 Nov Accepted 11 Jan Published 20 Feb Abstract Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Introduction Amino acids, the monomeric unit of proteins, are encoded by triplet of nucleotides called codons. Materials and Methods 2. Hydrophobicity Analysis Grand average of hydrophobicity score Gravy score was calculated to quantify the general average hydrophobicity for the translated gene product found in albumin superfamily.

Relative Synonymous Codon Usage RSCU Relative synonymous codon usage RSCU was calculated in order to examine the frequency of each synonymous codon that encoded the same amino acid without confounding effect on the composition of amino acid.



0コメント

  • 1000 / 1000