Secondary Structure Prediction
See also:
Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the local secondary structures of proteins and RNA sequences based only on knowledge of their primary structure - amino acid or nucleotide sequence, respectively. For proteins, a prediction consists of assigning regions of the amino acid sequence as likely alpha helices, beta strands (often noted as "extended" conformations), or turns. The success of a prediction is determined by comparing it to the results of the DSSP algorithm applied to the crystal structure of the protein; for nucleic acids, it may be determined from the hydrogen bonding pattern. Specialized algorithms have been developed for the detection of specific well-defined patterns such as transmembrane helices and coiled coils in proteins, or canonical microRNA structures in RNA.[1]
The best modern methods of secondary structure prediction in
proteins reach about 80% accuracy; this high accuracy allows the use of
the predictions in fold recognition and ab initio protein structure prediction, classification of structural motifs, and refinement of sequence alignments. The accuracy of current protein secondary structure prediction methods is assessed in weekly benchmarks such as LiveBench and EVA.
The problems of predicting RNA secondary structure are broadly related but dependent mainly on base pairing and base stacking
interactions; many RNA molecules have several possible
three-dimensional structures, so predicting these structures remains
out of reach unless obvious sequence and functional similarity to a
known class of RNA molecules, such as transfer RNA or microRNA, is observed. Many RNA secondary structure prediction methods rely on variations of dynamic programming and therefore are unable to efficiently identify pseudoknots.
Protein structure
Early methods of secondary structure prediction, introduced in the 1960s and early 1970s,[2] focused on identifying likely alpha helices and were based mainly on helix-coil transition models.[3]
Significantly more accurate predictions that included beta sheets were
introduced in the 1970s and relied on statistical assessments based on
probability parameters derived from known solved structures. These
methods, applied to a single sequence, are typically at most about
60-65% accurate, and often underpredict beta sheets.[1] The evolutionary conservation of secondary structures can be exploited by simultaneously assessing many homologous sequences in a multiple sequence alignment,
by calculating the net secondary structure propensity of an aligned
column of amino acids. In concert with larger databases of known
protein structures and modern machine learning methods such as neural nets and support vector machines, these methods can achieve up 80% overall accuracy in globular proteins.[4] The theoretical upper limit of accuracy is around 90%,[4]
partly due to idiosyncrasies in DSSP assignment near the ends of
secondary structures, where local conformations vary under native
conditions but may be forced to assume a single conformation in
crystals due to packing constraints. Limitations are also imposed by
secondary structure prediction's inability to account for tertiary structure;
for example, a sequence predicted as a likely helix may still be able
to adopt a beta-strand conformation if it is located within a
beta-sheet region of the protein and its side chains pack well with
their neighbors. Dramatic conformational changes related to the
protein's function or environment can also alter local secondary
structure.
Chou-Fasman method
The Chou-Fasman method
was among the first secondary structure prediction algorithms developed
and relies predominantly on probability parameters determined from
relative frequencies of each amino acid's appearance in each type of
secondary structure.[5]
The original Chou-Fasman parameters, determined from the small sample
of structures solved in the mid-1970s, produce poor results compared to
modern methods, though the parameterization has been updated since it
was first published. The Chou-Fasman method is roughtly 50-60% accurate
in predicting secondary structures.[1]
GOR method
The GOR method, named for the three scientists who developed it - Garnier, Osguthorpe, and Robson - is an information theory-based method developed not long after Chou-Fasman that uses more powerful probabilistic techniques of Bayesian inference.[6]
The GOR method takes into account not only the probability of each
amino acid having a particular secondary structure, but also the conditional probability
of the amino acid assuming each structure given that its neighbors
assume the same structure. This method is both more sensitive and more
accurate because amino acid structural propensities are only strong for
a small number of amino acids such as proline and glycine.
The original GOR method is roughly 65% accurate and is dramatically
more successful in predicting alpha helices than beta sheets, which it
frequently mispredicts as loops or disorganized regions.[1]
Machine learning
Neural network
methods use training sets of solved structures to identify common
sequence motifs associated with particular arrangements of secondary
structures. These methods are over 70% accurate in their predictions,
although beta strands are still often underpredicted due to the lack of
three-dimensional structural information that would allow assessment of
hydrogen bonding patterns that can promote formation of the extended conformation required for the presence of a complete beta sheet.[1]
Support vector machines have proven particularly useful for predicting the locations of turns, which are difficult to identify with statistical methods.[7]
The requirement of relatively small training sets has also been cited
as an advantage to avoid overfitting to existing structural data.[8]
Extensions of machine learning techniques attempt to predict more fine-grained local properties of proteins, such as backbone dihedral angles in unassigned regions. Both SVMs[9] and neural networks[10] have been applied to this problem.
RNA structure
Dynamic programming algorithms are commonly used to detect base pairing patterns that are "well-nested", that is, form hydrogen bonds only to bases that do not overlap one another in sequence position. Secondary structures that fall into this category include double helices, stem-loops, and variants of the "cloverleaf" pattern found in transfer RNA molecules. These methods rely on precalculated parameters estimating the free energy associated with particular types of base-pairing interactions, including Watson-Crick and Hoogsteen base pairs.
Depending on the complexity of the method, single base pairs may be
considered, or short two- or three-base segments to incorporate the
effects of base stacking. This method cannot identify pseudoknots, which are not well nested, without substantial algorithmic modifications that are extremely computationally expensive.[11]
Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in evolution;
maintenance at two widely separated sites of a pair of base-pairing
nucleotides indicates the presence of a structurally required hydrogen
bond between those positions. The general problem of pseudoknot
prediction has been shown to be NP-complete.[12]
References
- ^ a b c d e Mount DM (2004). Bioinformatics: Sequence and Genome Analysis, 2, Cold Spring Harbor Laboratory Press. ISBN 0879697121.
- ^ Guzzo, AV (1965). "Influence of Amino-Acid Sequence on Protein Structure". Biophysical Journal 5: 809–822.
Prothero, JW (1966). "Correlation between Distribution of Amino Acids and Alpha Helices". Biophysical Journal 6: 367–370.
Schiffer, M; Edmundson AB (1967).
"Use of Helical Wheels to Represent Structures of Proteins and to
Identify Segments with Helical Potential". Biophysical Journal 7: 121–?.
Kotelchuck, D; Scheraga HA (1969).
"The Influence of Short-Range Interactions on Protein Conformation, II.
A Model for Predicting the α-Helical Regions of Proteins". Proceedings of the National Academy of Science USA 62: 14–21. doi:10.1073/pnas.62.1.14. PMID 5253650.
Lewis, PN; Gō N, Gō M, Kotelchuck D,
Scheraga HA (1970). "Helix Probability Profiles of Denatured Proteins
and Their Correlation with Native Structures". Proceedings of the National Academy of Science USA 65: 810–815. doi:10.1073/pnas.65.4.810. PMID 5266152.
- ^ Froimowitz M, Fasman GD. (1974). Prediction of the secondary structure of proteins using the helix-coil transition theory. Macromolecules 7(5):583-9.
- ^ a b
Dor O, Zhou Y. (2006). Achieving 80% tenfold cross-validated accuracy
for secondary structure prediction by large-scale training. Proteins Epub. PMID 17177203
- ^ Chou PY, Fasman GD. (1974). Prediction of protein conformation. Biochemistry. 13(2):222-45.
- ^
Garnier J, Osguthorpe DJ, Robson B. (1978). Analysis of the accuracy
and implications of simple methods for predicting the secondary
structure of globular proteins. J Mol Biol 120:97-120.
- ^ Pham TH, Satou K, Ho TB. (2005). Support vector machines for prediction and analysis of beta and gamma-turns in proteins. J Bioinform Comput Biol 3(2):343-58. PMID 15852509
- ^ Zhang Q, Yoon S, Welsh WJ. (2005). Improved method for predicting beta-turn using support vector machine. Bioinformatics 21(10):2370-4. PMID 15797917
- ^ Zimmermann O, Hansmann UH. (2006). Support vector machines for prediction of dihedral angle regions. Bioinformatics 22(24):3009-15. PMID 17005536
- ^ Kuang R, Leslie CS, Yang AS. (2004). Protein backbone angle prediction with machine learning approaches. Bioinformatics 20(10):1612-21. PMID 14988121
- ^
Rivas E, Eddy S. (1999). A dynamic programming algorithm for RNA
structure prediction including pseudoknots, J Mol Biol, 285(5):
2053-2068.
- ^ Lyngsø RB, Pedersen CN. (2000). RNA pseudoknot prediction in energy-based models. J Comput Biol 7(3-4): 409-427.
External links
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Secondary Structure Prediction"
|
|