SCRATCH: A Quick Description

Methods

SSpro v 4.5

SSpro is a server for protein secondary structure prediction based on an ensemble of 100 1D-RNNs (one dimensional recurrent neural networks). For a detailed explanation of the methods see in the references. SSpro version 1 was online on 3/13/2000. In one year it handled more than 10,000 queries from 60 domains, at least 50 countries all over the world.

From the very beginning SSpro 3.0 was tested by the independent assessor EVA, and showed a performance constantly exceeding 76% correctly classified residues on structures with no homologues in PDB, thus ranking always in first position among the servers tested. SSpro v4.0 currently achieves an accuracy of 78.7% on the independent evaluator server EVA. SSpro v4.5 includes the direct incorporation of homologous protein's secondary structure and probablistic methods to improve the SOV score.

Download SSpro4.0 Executable for Linux (Academic Use Only)

SSpro8

SSpro8 is an experimental extension to SSpro. Instead of using three classes (helix, strand and the rest) to assign the secondary structure of a protein, SSpro8 adopts the full DSSP 8-class output classification:

For a detailed description of the tests performed on SSpro8, see the references. The overall performance (Q8) of the system currently online (based on PSI-BLAST profiles) is approximately 63%. NOTE: SSpro8 is a completely different system from SSpro. Their results may not match.

ABTMpro

ABTMpro is a three-class predictor of transmembrane type. The input is the amino acid sequence and the output is "alpha helical transmembrane", "beta barrel transmembrane" or "non-transmembrane".

CONpro

CONpro is a server that predicts whether the number of contacts of each residue in a protein is above or below the average for that residue. The prediction of CONpro is based on 1D-RNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST. The threshold radius at which residues are considered in contact is at 12Å. The accuracy of CONpro is 73%. The complete system is an ensemble of 10 1D-RNNs. For a more detailed explanation, see in the references.

ACCpro

ACCpro is a server for the prediction of the relative solvent accessibility of protein residues. The prediction of ACCpro is based on 1D-RNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST. Each residue in a protein is predicted as buried or exposed, i.e. less or more accessible than a specified threshold. All thresholds between 0% and 95% at steps of 5% are available. For a 25% threshold, the 'hard' case corresponding to practically identical numbers of buried and exposed residues, ACCpro classifies correctly 77.2% of the residues, better than any other system previously described. For a more detailed explanation, see in the references.

Download ACCpro Executable for Linux (Academic Use Only)

ACCpro20

ACCpro20 predicts the relative solvent accessiblity at all thresholds between 0% and 100% at 5% increments.

DOMpro

DOMpro predicts domain locations using a 1D-RNN. DOMpro takes an input the sequence profile, predicted secondary structure, and predicted relative solvent accessiblity. The output of the 1D-RNN is a classification for each residue as being in a domain boundary region or not. The domains are then infered from this output. For a more detailed explanation, see the manuscript in references.

DISpro

DISpro uses a 1D-RNN to predict the probablity that residues are disorder. The probabilities are also thresholded at probablity .5 to make a hard classification. The input to DISpro is the sequence profile, predicted secondary structure, and predicted relative solvent accesiblity. For a more detailed explanation, see the manuscript in references.
Download the dataset used in training DISpro here.

DIpro

DIpro is a cysteine disulfide bond predictor based on 2D recurrent neural network, support vector machine, graph matching and regression algorithms. It can predict if the sequence has disulfide bonds or not, estimate the number of disulfide bonds, and predict the bonding state of each cysteine and the bonded pairs. It yields the best accuracy on the benchmark dataset Sp39. It can handle any number of disulfide bonds where most of methods available so far only can handle less than 6 disulfide bonds.

Procedure: The seqeunce is processed in two steps. Step 1, use support vector machine to classify if the sequence has disulfide bonds or not. Step 2, use neural network and graph algorithm to predict the number of bonds, bond pattern. For a more detailed explanation, see in references.

CMAPpro

CMAPpro is a server for the prediction of maps of contacts between protein residues. The prediction of CMAPpro is based on ensembles of Generalised Recurrent Neural Networks for the translation of matrices. The input of the system consists of two-dimensional profiles extracted from multiple alignments of homologues generated by PSI-BLAST, and of secondary structure and solvent accessibility predictions obtained respectively from SSpro and ACCpro. Maps at 8Å and 12Å are available, meaning that two amino acids are defined as being in contact if their C-α are closer than 8Å and 12Å respectively. For a description of the tests performed on CMAPpro, see the references.

3Dpro

3Dpro is a server that predicts protein tertiary structure. 3Dpro uses predicted structural features, and PDB knowledge based statistical terms in the energy function. The conformational search uses a move set consisting of fragment replacement (using a fragment library built from the PDB) as well as random perturbations to the model. Moves are selected or rejected based on a simulated annealing method with linear cooling. Multiple models are constructed using random seeds and the model with the lowest energy is selected as the final prediction. 3Dpro is currently a de nuvo method (structural templates are not used).

The results of 3Dpro's performance at CASP6 can be found here.

Input formats

Email

Your email address, the place where the prediction will be delivered. NOTE: Check that you typed your address correctly. Approximately 5% of the queries handled by SSpro 1.0 didn't receive an answer because of incorrect typing.

Query name

An optional name for your query. We strongly suggest that you use one, especially if sending more than one query. The order in which you send your queries may not correspond to the order in which you receive the answers.

Input sequence

The sequence of amino acids:

Output format

Replies are sent by email. SSpro, SSpro8, ACCpro, ACCpro20, DOMpro, DISpro, DIpro and CONpro replies come as text, embedded in the body of the email. Here you have an example of prediction:

Name: short
JOB ID: 98453

Amino Acids:
MQIFVKTLTGKTITLEVEPSDTIENVKAKI

Predicted Secondary Structure:
CEEEEEEECCCEEEEEECCCCCHHHHHCCC

Predicted Secondary Structure (8 Class):
CEEEEEEEESEEEEEEECCCSHHHHEECCC

Predicted Relative Solvent Accessiblity (at 25% exposed threshold):
ee---ee-eeee-e-e-eeeee-ee-eeee

Predicted Relative Solvent Accessiblity (All Thresholds):
0%  ee---e--e-ee-e-e-e-eee-ee-eeee
5%  ee---e--e-ee-e-e-e-eee-ee-eeee
10% ee---e--e-ee-e-e-e-eee-ee-eeee
15% ee---e--e-ee-e-e-e-eee-ee-eeee
20% ee---e--e-ee-e-e-e-eee-ee-eeee
25% ee---e--e-ee-e-e-e-eee-ee-eeee
30% ee---e--e-ee-e-e-e-eee-ee-eeee
35% ee---e--e-ee-e-e-e-eee-ee-eeee
40% ee------e--e-e-e---eee-ee-eeee
45% ee---------------------e--eeee
50% e----------------------e--eeee
55% e-------------------------e-ee
60% e---------------------------ee
65% e---------------------------ee
70% e---------------------------ee
75% e---------------------------ee
80% e---------------------------ee
85% e---------------------------ee
90% e---------------------------ee
95% ------------------------------

Predicted Contact Number:
------------------------------

Predicted Disordered Residues:
OOOOOOOOOOOOOOOOOOOOOOOOOOOODD

Predicted Disorder Probability:
0.16 0.07 0.04 0.04 0.03 0.02 0.02 0.01 0.02 0.02 0.02 0.01 0.01 0.01 0.02 0.02 0.03 0.04 0.06 0.06 0.12 0.13 0.20 0.17 0.18 0.20 0.23 0.48 0.55 0.58

Predicted Domains:
Domain 1: 1 - 30

Predicted Disulfide Bonds:
Input sequence has LESS THAN TWO cysteins and therefore cannot form disulfide bonds.

Predicted Contact Maps:
SEE ATTACHMENTS

The predictions have the following meaning:

Note: Since CMAPpro and 3Dpro predictions are computationally intensive only proteins of length at most 400 amino acids will be accepted if CMAPpro or 3Dpro predictions are selected.

Return to SCRATCH

References

For a general overview see:

J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH: a Protein Structure and Structural Feature Prediction Server, Nucleic Acids Research, Special Issue on Web servers, in press, 2005.

P. Baldi and G. Pollastri, "The Principled Design of Large-Scale Recursive Neural Network Architectures-DAG-RNNs and the Protein Structure Prediction Problem", Journal of Machine Learning Research, 4, 575-603, (2003).
Download PDF.

P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download PDF.

For an explanation of the methods used in SSpro and SSpro8 see:

G.Pollastri, D.Przybylski, B.Rost, P.Baldi, "Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles", Proteins, 47, 228-235, 2002.
Download PDF, Abstract and HTML (Proteins web site).

Or:
P.Baldi, S.Brunak, P.Frasconi, G.Pollastri, and G.Soda, "Exploiting the Past and the Future in Protein Secondary Structure Prediction", Bioinformatics, 15, 937-946, (1999).
Download PDF, HTML (Bioinformatics web site).

Or (quick abstract):
Pollastri,G.,Baldi,P., "SSpro, a web server for protein secondary structure prediction based on recurrent neural networks"
Proceedings of CASP2000, Asilomar, CA
HTML version, and gzipped postscript.

A more detailed description of 1D-RNNs (formally called bidirectional recurrent neural networks (BRNNs) can be found here:
Baldi,P., Brunak,S., Frasconi,P., Pollastri,G., and Soda,G., "Bidirectional Dynamics for Protein Secondary Structure Prediction", in Sequence Learning: Paradigms, Algorithms, and Applications, R. Sun and L. Giles Editors, Springer Verlag, (2000).
Download PDF, Abstract (Book web site)

For an explanation of the methods used in ACCpro and CONpro see:

P. Baldi and G. Pollastri. "The Principled Design of Large-Scale Recursive Neural Network Architectures—DAG-RNNs and the Protein Structure Prediction Problem", Journal of Machine Learning Research, 4, 575-602, 2003.
Download PDF, Abstract and HTML (JMLR web site)

G.Pollastri, P.Baldi, P.Fariselli, R.Casadio, "Prediction of Coordination Number and Relative Solvent Accessibility in Proteins", Proteins, 47, 142-153, 2002.
Download PDF, Abstract and HTML (Proteins web site)

Or:
Pollastri,G., Baldi,P., Fariselli,P., Casadio,R., "Improved Prediction of the Number of Residue Contacts in Proteins by Recurrent Neural Networks", Bioinformatics, 17 Suppl 1, S234-S242 (2001).
Download PDF, HTML abstract (Bioinformatics web site).

For an explanation of the methods used in DOMpro see:

J. Cheng, M. Sweredoski, P. Baldi, "DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks", submitted, 2005.
Download PDF,

For an explanation of the methods used in DISpro see:

J. Cheng, M. Sweredoski, P. Baldi, "Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data", Data Mining and Knowledge Discovery, in press, 2005.
Download PDF,

For an explanation of the methods used in DIpro see:

P.Baldi, J. Cheng, A. Vullo, "Large-Scale Prediction of Disulphide Bond Connectivity", Advances in Neural Information Processing Systems (NIPS 2004) 17,L. Saul ,Y. Weiss, and L. Bottou editors, MIT press, pp.97-104, Cambridge, MA, 2005.
Download PDF,

For an explanation of the methods used in CMAPpro see:

G.Pollastri, P.Baldi, "Prediction of Contact Maps by Recurrent Neural Network Architectures and Hidden Context Propagation from All Four Cardinal Corners", Bioinformatics, 18 Suppl 1, S62-S70 (2002).
Download PDF, HTML abstract (Bioinformatics web site).

And:
P.Baldi, G.Pollastri, "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002.
Download PDF.

Return to SCRATCH