SSpro is a server for protein secondary structure prediction based on an ensemble of 100 1D-RNNs (one dimensional recurrent neural networks). For a detailed explanation of the methods see in the references. SSpro version 1 was online on 3/13/2000. In one year it handled more than 10,000 queries from 60 domains, at least 50 countries all over the world.
From the very beginning SSpro 3.0 was tested by the independent assessor EVA, and showed a performance constantly exceeding 76% correctly classified residues on structures with no homologues in PDB, thus ranking always in first position among the servers tested. SSpro v4.0 currently achieves an accuracy of 78.7% on the independent evaluator server EVA. SSpro v4.5 includes the direct incorporation of homologous protein's secondary structure and probablistic methods to improve the SOV score.
Download SSpro4.0 Executable for Linux (Academic Use Only)SSpro8 is an experimental extension to SSpro. Instead of using three classes (helix, strand and the rest) to assign the secondary structure of a protein, SSpro8 adopts the full DSSP 8-class output classification:
For a detailed description of the tests performed on SSpro8, see the references. The overall performance (Q8) of the system currently online (based on PSI-BLAST profiles) is approximately 63%. NOTE: SSpro8 is a completely different system from SSpro. Their results may not match.
ABTMpro is a three-class predictor of transmembrane type. The input is the amino acid sequence and the output is "alpha helical transmembrane", "beta barrel transmembrane" or "non-transmembrane".
CONpro is a server that predicts whether the number of contacts of each residue in a protein is above or below the average for that residue. The prediction of CONpro is based on 1D-RNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST. The threshold radius at which residues are considered in contact is at 12Å. The accuracy of CONpro is 73%. The complete system is an ensemble of 10 1D-RNNs. For a more detailed explanation, see in the references.
ACCpro is a server for the prediction of the relative solvent accessibility of protein residues. The prediction of ACCpro is based on 1D-RNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST. Each residue in a protein is predicted as buried or exposed, i.e. less or more accessible than a specified threshold. All thresholds between 0% and 95% at steps of 5% are available. For a 25% threshold, the 'hard' case corresponding to practically identical numbers of buried and exposed residues, ACCpro classifies correctly 77.2% of the residues, better than any other system previously described. For a more detailed explanation, see in the references.
Download ACCpro Executable for Linux (Academic Use Only)ACCpro20 predicts the relative solvent accessiblity at all thresholds between 0% and 100% at 5% increments.
DOMpro predicts domain locations using a 1D-RNN. DOMpro takes an input the sequence profile, predicted secondary structure, and predicted relative solvent accessiblity. The output of the 1D-RNN is a classification for each residue as being in a domain boundary region or not. The domains are then infered from this output. For a more detailed explanation, see the manuscript in references.
DISpro uses a 1D-RNN to predict the probablity that residues are
disorder. The probabilities are also thresholded at probablity .5 to
make a hard classification. The input to DISpro is the sequence
profile, predicted secondary structure, and predicted relative solvent
accesiblity.
For a more detailed explanation, see the manuscript in references.
Download the dataset used in training DISpro here.
DIpro is a cysteine disulfide bond predictor based on 2D recurrent neural network, support vector machine, graph matching and regression algorithms. It can predict if the sequence has disulfide bonds or not, estimate the number of disulfide bonds, and predict the bonding state of each cysteine and the bonded pairs. It yields the best accuracy on the benchmark dataset Sp39. It can handle any number of disulfide bonds where most of methods available so far only can handle less than 6 disulfide bonds.
Procedure: The seqeunce is processed in two steps. Step 1, use support vector machine to classify if the sequence has disulfide bonds or not. Step 2, use neural network and graph algorithm to predict the number of bonds, bond pattern. For a more detailed explanation, see in references.
CMAPpro is a server for the prediction of maps of contacts between protein residues. The prediction of CMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the translation of matrices. The input of the system consists of two-dimensional profiles extracted from multiple alignments of homologues generated by PSI-BLAST, and of secondary structure and solvent accessibility predictions obtained respectively from SSpro and ACCpro. Maps at 8Å and 12Å are available, meaning that two amino acids are defined as being in contact if their C-α are closer than 8Å and 12Å respectively. For a description of the tests performed on CMAPpro, see the references.
3Dpro is a server that predicts protein tertiary structure. 3Dpro uses predicted structural features, and PDB knowledge based statistical terms in the energy function. The conformational search uses a move set consisting of fragment replacement (using a fragment library built from the PDB) as well as random perturbations to the model. Moves are selected or rejected based on a simulated annealing method with linear cooling. Multiple models are constructed using random seeds and the model with the lowest energy is selected as the final prediction. 3Dpro is currently a de nuvo method (structural templates are not used).
The results of 3Dpro's performance at CASP6 can be found here.
Your email address, the place where the prediction will be delivered. NOTE: Check that you typed your address correctly. Approximately 5% of the queries handled by SSpro 1.0 didn't receive an answer because of incorrect typing.
An optional name for your query. We strongly suggest that you use one, especially if sending more than one query. The order in which you send your queries may not correspond to the order in which you receive the answers.
The sequence of amino acids:
Replies are sent by email. SSpro, SSpro8, ACCpro, ACCpro20, DOMpro, DISpro, DIpro and CONpro replies come as text, embedded in the body of the email. Here you have an example of prediction:
Name: short
JOB ID: 98453
Amino Acids:
MQIFVKTLTGKTITLEVEPSDTIENVKAKI
Predicted Secondary Structure:
CEEEEEEECCCEEEEEECCCCCHHHHHCCC
Predicted Secondary Structure (8 Class):
CEEEEEEEESEEEEEEECCCSHHHHEECCC
Predicted Relative Solvent Accessiblity (at 25% exposed threshold):
ee---ee-eeee-e-e-eeeee-ee-eeee
Predicted Relative Solvent Accessiblity (All Thresholds):
0% ee---e--e-ee-e-e-e-eee-ee-eeee
5% ee---e--e-ee-e-e-e-eee-ee-eeee
10% ee---e--e-ee-e-e-e-eee-ee-eeee
15% ee---e--e-ee-e-e-e-eee-ee-eeee
20% ee---e--e-ee-e-e-e-eee-ee-eeee
25% ee---e--e-ee-e-e-e-eee-ee-eeee
30% ee---e--e-ee-e-e-e-eee-ee-eeee
35% ee---e--e-ee-e-e-e-eee-ee-eeee
40% ee------e--e-e-e---eee-ee-eeee
45% ee---------------------e--eeee
50% e----------------------e--eeee
55% e-------------------------e-ee
60% e---------------------------ee
65% e---------------------------ee
70% e---------------------------ee
75% e---------------------------ee
80% e---------------------------ee
85% e---------------------------ee
90% e---------------------------ee
95% ------------------------------
Predicted Contact Number:
------------------------------
Predicted Disordered Residues:
OOOOOOOOOOOOOOOOOOOOOOOOOOOODD
Predicted Disorder Probability:
0.16 0.07 0.04 0.04 0.03 0.02 0.02 0.01 0.02 0.02 0.02 0.01 0.01 0.01 0.02 0.02 0.03 0.04 0.06 0.06 0.12 0.13 0.20 0.17 0.18 0.20 0.23 0.48 0.55 0.58
Predicted Domains:
Domain 1: 1 - 30
Predicted Disulfide Bonds:
Input sequence has LESS THAN TWO cysteins and therefore cannot form disulfide bonds.
Predicted Contact Maps:
SEE ATTACHMENTS
The predictions have the following meaning:
Note: Since CMAPpro and 3Dpro predictions are computationally intensive only proteins of length at most 400 amino acids will be accepted if CMAPpro or 3Dpro predictions are selected.
For a general overview see:
J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH: a Protein Structure and Structural Feature Prediction Server, Nucleic Acids Research, Special Issue on Web servers, in press, 2005.
P. Baldi and G. Pollastri, "The Principled Design of Large-Scale Recursive
Neural Network Architectures-DAG-RNNs and the Protein Structure Prediction
Problem", Journal of Machine Learning Research, 4, 575-603, (2003).
Download PDF.
P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics",
IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download PDF.
For an explanation of the methods used in SSpro and SSpro8 see:
G.Pollastri, D.Przybylski, B.Rost, P.Baldi, "Improving the Prediction of Protein Secondary Structure
in Three and Eight Classes Using Recurrent Neural Networks and Profiles", Proteins, 47, 228-235, 2002.
Download
PDF,
Abstract and HTML
(Proteins web site).
Or:
P.Baldi, S.Brunak, P.Frasconi, G.Pollastri, and G.Soda, "Exploiting the Past and the Future in Protein Secondary
Structure Prediction", Bioinformatics, 15, 937-946, (1999).
Download
PDF,
HTML
(Bioinformatics web site).
Or (quick abstract):
Pollastri,G.,Baldi,P., "SSpro, a web server for protein secondary structure prediction based on recurrent neural networks"
Proceedings of CASP2000, Asilomar, CA
HTML version, and
gzipped postscript.
A more detailed description of 1D-RNNs (formally called
bidirectional recurrent neural networks (BRNNs) can be found here:
Baldi,P., Brunak,S., Frasconi,P., Pollastri,G., and Soda,G., "Bidirectional Dynamics for Protein
Secondary Structure Prediction", in Sequence Learning: Paradigms, Algorithms, and Applications, R. Sun and L. Giles Editors, Springer Verlag, (2000).
Download PDF,
Abstract
(Book web site)
For an explanation of the methods used in ACCpro and CONpro see:
P. Baldi and G. Pollastri. "The Principled Design of Large-Scale Recursive
Neural Network Architectures—DAG-RNNs and the Protein Structure Prediction
Problem", Journal of Machine Learning Research, 4, 575-602, 2003.
Download PDF,
Abstract and HTML
(JMLR web site)
G.Pollastri, P.Baldi, P.Fariselli, R.Casadio, "Prediction of Coordination Number and
Relative Solvent Accessibility in Proteins", Proteins, 47, 142-153, 2002.
Download
PDF,
Abstract and HTML
(Proteins web site)
Or:
Pollastri,G., Baldi,P., Fariselli,P., Casadio,R., "Improved Prediction of the Number of Residue
Contacts in Proteins by Recurrent Neural Networks", Bioinformatics, 17 Suppl 1, S234-S242 (2001).
Download
PDF,
HTML abstract
(Bioinformatics web site).
For an explanation of the methods used in DOMpro see:
J. Cheng, M. Sweredoski, P. Baldi, "DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks",
submitted, 2005.
Download
PDF,
For an explanation of the methods used in DISpro see:
J. Cheng, M. Sweredoski, P. Baldi, "Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data",
Data Mining and Knowledge Discovery, in press, 2005.
Download
PDF,
For an explanation of the methods used in DIpro see:
P.Baldi, J. Cheng, A. Vullo, "Large-Scale Prediction of Disulphide Bond Connectivity",
Advances in Neural Information Processing Systems (NIPS 2004) 17,L. Saul ,Y. Weiss, and L. Bottou editors, MIT press, pp.97-104, Cambridge, MA, 2005.
Download
PDF,
For an explanation of the methods used in CMAPpro see:
G.Pollastri, P.Baldi, "Prediction of Contact Maps by Recurrent Neural Network Architectures and
Hidden Context Propagation from All Four Cardinal Corners", Bioinformatics, 18 Suppl 1, S62-S70 (2002).
Download
PDF,
HTML abstract
(Bioinformatics web site).
And:
P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics",
IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download PDF.