Protein Structure Prediction
See also:
Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. Its aim is the prediction of the three-dimensional structure of proteins from their amino acid
sequences, sometimes including additional relevant information such as
the structures of related proteins. In other words, it deals with the
prediction of a protein's tertiary structure from its primary structure. Protein structure prediction is of high importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes). Every two years, the performance of current methods is assessed in the CASP experiment.
The practical role of protein structure prediction is now more
important than ever. Massive amounts of protein sequence data are
produced by modern large-scale DNA sequencing efforts such as the Human Genome Project. Despite community-wide efforts in structural genomics, the output of experimentally determined protein structures — typically by time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy — is lagging far behind the output of protein sequences.
A number of factors exist that make protein structure prediction a
very difficult task. The two main problems are that the number of
possible protein structures is extremely large, and that the physical
basis of protein structural stability is not fully understood. As a
result, any protein structure prediction method needs a way to explore
the space of possible structures efficiently (a search strategy), and a
way to identify the most plausible structure (an energy function).
In comparative structure prediction, the search space is
pruned by the assumption that the protein in question adopts a
structure that is reasonably close to the structure of at least one
known protein. In de novo or ab initio structure
prediction, no such assumption is made, which results in a much harder
search problem. In both cases, an energy function is needed to
recognize the native structure, and to guide the search for the native
structure. Unfortunately, the construction of such an energy function
is to a great extent an open problem.
Direct simulation of protein folding in atomic detail, via methods such as molecular dynamics
with a suitable energy function, is typically not tractable due to the
high computational cost, despite the efforts of distributed computing
projects such as Folding@home. Therefore, most de novo structure prediction methods rely on simplified representations of the atomic structure of proteins.
The above mentioned issues apply to all proteins, including well-behaving, small, monomeric proteins. In addition, for specific proteins (such as for example multimeric proteins and disordered proteins), the following issues also arise:
- Some proteins require stabilisation by additional domains or
binding partners to adopt their native structure. This requirement is
typically unknown in advance and difficult to handle by a prediction
method.
- The tertiary structure of a native protein may not be readily
formed without the aid of additional agents. For example, proteins
known as chaperones are required for some proteins to properly fold. Other proteins cannot fold properly without modifications such as glycosylation.
- A particular protein may be able to assume multiple conformations depending on its chemical environment.
- The biologically active conformation may not be the most thermodynamically favorable.
Due to the increase in computer power, and especially new
algorithms, much progress is being made to overcome these problems.
However, routine de novo prediction of protein structures, even for
small proteins, is still not achieved.
Ab initio protein modelling
Ab initio- or de novo- protein modelling methods seek
to build three-dimensional protein models "from scratch", i.e., based
on physical principles rather than (directly) on previously solved
structures. There are many possible procedures that either attempt to
mimic protein folding or apply some stochastic method to search possible solutions (i.e., global optimization
of a suitable energy function). These procedures tend to require vast
computational resources, and have thus only been carried out for tiny
proteins. To predict protein structure de novo for larger
proteins will require better algorithms and larger computational
resources like those afforded by either powerful supercomputers (such
as Blue Gene or MDGRAPE-3) or distributed computing (such as Folding@home, the Human Proteome Folding Project and Rosetta@Home).
Although these computational barriers are vast, the potential benefits
of structural genomics (by predicted or experimental methods) make ab initio structure prediction an active research field.
As an intermediate step towards predicted protein structures, contact map predictions have been proposed.
Comparative protein modelling
Comparative protein modelling uses previously solved structures as
starting points, or templates. This is effective because it appears
that although the number of actual proteins is vast, there is a limited
set of tertiary structural motifs
to which most proteins belong. It has been suggested that there are
only around 2000 distinct protein folds in nature, though there are
many millions of different proteins.
These methods may also be split into two groups:
- Homology modelling is based on the reasonable assumption that two homologous
proteins will share very similar structures. Because a protein's fold
is more evolutionarily conserved than its amino acid sequence, a target
sequence can be modeled with reasonable accuracy on a very distantly
related template, provided that the relationship between target and
template can be discerned through sequence alignment.
It has been suggested that the primary bottleneck in comparative
modelling arises from difficulties in alignment rather than from errors
in structure prediction given a known-good alignment.[1] Unsurprisingly, homology modelling is most accurate when the target and template have similar sequences.
- Protein threading[2]
scans the amino acid sequence of an unknown structure against a
database of solved structures. In each case, a scoring function is used
to assess the compatibility of the sequence to the structure, thus
yielding possible three-dimensional models. This type of method is also
known as 3D-1D fold recognition due to its compatibility
analysis between three-dimensional structures and linear protein
sequences. This method has also given rise to methods performing an inverse folding search
by evaluating the compatibility of a given structure with a large
database of sequences, thus predicting which sequences have the
potential to produce a given fold.
Side chain geometry prediction
Even structure prediction methods that are reasonably accurate for
the peptide backbone often get the orientation and packing of the amino
acid side chains wrong. Methods that specifically address the problem of predicting side chain geometry include dead-end elimination and the self-consistent mean field method. Both discretize the continuously varying dihedral angles that determine a side chain's orientation relative to the backbone into a set of rotamers
with fixed dihedral angles. The methods then attempt to identify the
set of rotamers that minimize the model's overall energy. Rotamers are
the side chain conformations with low energy. Such methods are most
useful for analyzing the protein's hydrophobic
core, where side chains are more closely packed; they have more
difficulty addressing the looser constraints and higher flexibility of
surface residues.[3]
Software
MODELLER is a popular software tool for producing homology models using methodology derived from NMR spectroscopy data processing. SwissModel provides an automated web server for basic homology modeling. Common software tools for protein threading are HHpred, bioinfo.pl, Robetta, and 3D-PSSM. The basic algorithm for threading is described in[2] and is fairly straightforward to implement.
TIP is a knowledgebase of STRUCTFAST[4] models and precomputed similarity relationships between sequences, structures, and binding sites.
A very recent review of currently popular software for structure prediction can be found at.[5] A partial list of web servers and available tools is maintained here.
Several distributed computing projects concerning protein structure prediction have also been implemented, such as the Folding@home, Rosetta@home, Human Proteome Folding Project, Predictor@home and TANPAKU.
The Foldit
program seeks to investigate the pattern-recognition and puzzle-solving
abilities inherent to the human mind in order to create more successful
computer protein structure prediction software.
Protein-protein complexes
In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy, protein-protein docking
methods can be used to predict the structure of the complex.
Information of the effect of mutations at specific sites on the
affinity of the complex helps to understand the complex structure and
to guide docking methods.
For more information see the following links:
References
- ^ Zhang Y and Skolnick J (2005). "The protein structure prediction problem could be solved using the current PDB library". Proc Natl Acad Sci USA 102 (4): 1029–1034. doi:10.1073/pnas.0407152101. PMID 15653774. Entrez PubMed 15653774.
- ^ a b Bowie
JU, Luthy R, Eisenberg D (1991). "A method to identify protein
sequences that fold into a known three-dimensional structure". Science 253 (5016): 164–170. doi:10.1126/science.1853201. PMID 1853201. Entrez PubMed 1853201.
- ^ Voigt
CA, Gordon DB, Mayo SL (2000). "Trading accuracy for speed: A
quantitative comparison of search algorithms in protein sequence
design". J Mol Biol 299 (3): 789–803. doi:10.1006/jmbi.2000.3758. Entrez PubMed 10835284.
- ^ Debe
DA, Danzer JF, Goddard WA, Poleksic A (2006). "STRUCTFAST: Protein
sequence remote homology detection and alignment using novel dynamic
programming and profile-profile scoring". Proteins 64: 960–967. doi:10.1002/prot.21049. Entrez PubMed 16786595.
- ^ Nayeem
A, Sitkoff D, Krystek S Jr (2006). "A comparative study of available
software for high-accuracy homology modeling: From sequence alignments
to structural models". Protein Sci 15: 808–824. doi:10.1110/ps.051892906. PMID 16600967. Entrez PubMed 16600967.
External links
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Protein Structure Prediction"
|
|