SCRATCH: A Quick Description

Methods

SSpro

SSpro is a server for protein secondary structure prediction based on protein evolutionary information (sequence homology) and homologous protein's secondary structure (structure homology). For a detailed explanation of the methods, please refer to the references listed at the bottom of this page. SSpro currently achieves a performance exceeding 84% correctly classified residues on proteins with no homologs in the PDB and in the range of 87.8% to 98.7% correctly classified residues on proteins where homologs can be found in the PDB, ranking on top of the tested prediction servers.


Download SSpro 6.0 (free for academic use)

SSpro8

SSpro8 is an extension to SSpro. Instead of using three classes (helix, strand and the rest) to assign the secondary structure of a protein, SSpro8 adopts the full DSSP 8-class output classification:

For a detailed description of the tests performed on SSpro8, see the references. The overall performance of the system currently online is approximately 72% correctly classified residues on proteins with no homologs in the PDB and above 78% to 98% when homologs can be found in the PDB. NOTE: SSpro8 is a completely different system from SSpro. Their results may not match.


Download SSpro8 6.0 (free for academic use)

ABTMpro

ABTMpro is a server that predicts whether a given protein sequence is a transmembrane protein. If the given protein sequence is a transmembrane protein, ABTMpro further predicts the probabilities of the protein being an alpha helical transmembrane protein or a Beta Barrel transmembrane protein. The prediction framework consists of a Support Vector Machine, which utilizes features such as amino acid composition and properties, reduced alphabet composition, predicted secondary structure, evolutionary information etc. The overall accuracy of ABTMpro is upwards of 97%, and achieves MCC values of 0.93 and 0.94 on smaller data sets and MCC values of 0.85 and 0.63 on much larger tests for alpha helical and beta barrel transmembrane proteins respectively.

CONpro

CONpro is a server that predicts whether the number of contacts of each residue in a protein is above or below the average for that residue. The prediction of CONpro is based on 1D-RNNs, adopting as input a multiple alignment of homologues generated by PSI-BLAST. The threshold radius at which residues are considered in contact is at 12Å. The accuracy of CONpro is 73%. The complete system is an ensemble of 10 1D-RNNs. For a more detailed explanation, see in the references.

ACCpro

ACCpro is a server for the prediction of the relative solvent accessibility of protein residues. The prediction of ACCpro is based on 1D-RNNs, adopting as input a multiple alignment of homologs generated by PSI-BLAST. Each residue in a protein is predicted as buried or exposed, i.e. more or less accessible than a specified threshold. For the 25% threshold, the 'hard' case corresponding to practically identical numbers of buried and exposed residues, ACCpro correctly classifies ~81% of the residues and up to 90% when homologs exist in the PDB, better than any other system previously described. For a more detailed explanation, see in the references.


Download ACCpro 6.0 (free for academic use)

ACCpro20

ACCpro20 predicts the relative solvent accessiblity at all thresholds between 0% and 95% at 5% increments. It is a 20-class variant of ACCpro predictor. Performance of the system currently online is at the same level than ACCpro predictor in the hard case (25% accessibility threshold), and higher for the other thresholds.


Download ACCpro20 6.0 (free for academic use)

DOMpro

DOMpro predicts domain locations using a 1D-RNN. DOMpro takes an input the sequence profile, predicted secondary structure, and predicted relative solvent accessiblity. The output of the 1D-RNN is a classification for each residue as being in a domain boundary region or not. The domains are then infered from this output. For a more detailed explanation, see the manuscript in references.

Download DOMpro 1.0 executable for Linux (free for academic use) and the datasets used to train DOMpro.

DISpro

DISpro uses a 1D-RNN to predict the probablity that residues are disorder. The probabilities are also thresholded at probablity .5 to make a hard classification. The input to DISpro is the sequence profile, predicted secondary structure, and predicted relative solvent accesiblity. For a more detailed explanation, see the manuscript in references.
Download the dataset used in training DISpro here.

Download DISpro 1.0 executable for Linux (free for academic use)

DIpro

DIpro is a cysteine disulfide bond predictor based on 2D recurrent neural network, support vector machine, graph matching and regression algorithms. It can predict if the sequence has disulfide bonds or not, estimate the number of disulfide bonds, and predict the bonding state of each cysteine and the bonded pairs. It yields the best accuracy on the benchmark dataset Sp39. It can handle any number of disulfide bonds where most of methods available so far only can handle less than 6 disulfide bonds.

Procedure: The seqeunce is processed in two steps. Step 1, use support vector machine to classify if the sequence has disulfide bonds or not. Step 2, use neural network and graph algorithm to predict the number of bonds, bond pattern. For a more detailed explanation, see in references.
Download DIpro 2.0 software(free for academic use)
Download the dataset used in training DIpro here.

CMAPpro

CMAPpro is a server for the prediction of maps of contacts between protein residues. The prediction of CMAPpro is based on ensembles of Deep Neural Networks, which take into account the spatial dependencies of contact occurrences in local neighborhoods. The input of the system consists of two-dimensional profiles extracted from multiple alignments of homologues generated by PSI-BLAST, secondary structure and solvent accessibility predictions obtained respectively from SSpro and ACCpro, and predicted coarse contacts and orientations between secondary structure elements using two-dimensional Recurrent Neural Networks. Maps at 8Å are available, meaning that two amino acids are defined as being in contact if their C-β atoms (C-α for glycines ) are closer than 8Å. For a description of the tests performed on CMAPpro, see the references.

SVMcon

SVMcon predicts medium- to long-range residue-residue contacts using Support Vector Machines. The contact predictions are in the CASP format (residue index 1, residue index 2, 0, 8, contact probability). The contact distance threshold is 8 angstrom. The sequence separation between two residues is at least 6 residues. For information, see the references.
[Download SVMcon 1.0]
[Download SVMcon Training Set]
[Download SVMcon Test Set]

For commercial license, please contact: igb-license [at] ics [.] uci [.]edu

3Dpro

3Dpro is a server that predicts protein tertiary structure. 3Dpro uses predicted structural features, and PDB knowledge based statistical terms in the energy function. The conformational search uses a move set consisting of fragment replacement (using a fragment library built from the PDB) as well as random perturbations to the model. Moves are selected or rejected based on a simulated annealing method with linear cooling. Multiple models are constructed using random seeds and the model with the lowest energy is selected as the final prediction. 3Dpro is currently a de nuvo method (structural templates are not used).

The results of 3Dpro's performance at CASP6 can be found here.

SOLpro

SOLpro predicts the propensity of a protein to be soluble upon overexpression in E. coli using a two-stage SVM architecture based on multiple representations of the primary sequence. Each classifier of the first layer takes as input a distinct set of features describing the sequence. A final SVM classifier summarizes the resulting predictions and predicts if the protein is soluble or not as well as the corresponding probability.

Download SOLpro (free for academic, non commercial, use).

ANTIGENpro

ANTIGENpro is a sequence-based, alignment-free and pathogen-independant predictor of protein antigenicity. The predictions are made by a two-stage architecture based on multiple representations of the primary sequence and five machine learning algorithms. A final SVM classifier summarizes the resulting predictions and predicts if the protein is likely to be antigenic or not as well as the corresponding probability. ANTIGENpro is the first predictor of the whole protein antigenicity trained using reactivity data obtained by protein microarray analysis for five pathogens.

VIRALpro

VIRALpro is a predictor capable of identifying capsid and tail protein sequences using support vector machines (SVM) with an accuracy estimated to be between 90% and 97%. Predictions are based on the protein amino acid composition, on the protein predicted secondary structure, as predicted by SSpro, and on a boosted linear combination of HMM e-values obtained from 3,380 HMMs built from multiple sequence alignments of specific fragments - called contact fragments - of both capsid and tail sequences.

Download VIRALpro 1.0 (free for academic use)

Input formats

Email

Your email address, the place where the prediction will be delivered. NOTE: Check that you typed your address correctly. Approximately 5% of the queries handled by SSpro 1.0 didn't receive an answer because of incorrect typing.

Query name

An optional name for your query. We strongly suggest that you use one, especially if sending more than one query. The order in which you send your queries may not correspond to the order in which you receive the answers.

Input sequence

The sequence of amino acids:

Output format

Replies are sent by email. SSpro, SSpro8, ACCpro, ACCpro20, DOMpro, DISpro, DIpro, CONpro, SOLpro, ANTIGENpro, and VIRALpro replies come as text, embedded in the body of the email. Here you have an example of prediction:

Name: short

Amino Acids:
MQIFVKTLTGKTITLEVEPSDTIENVKAKI

Predicted Secondary Structure:
CEEEEEEECCCEEEEEECCCCCHHHHHCCC

Predicted Secondary Structure (8 Class):
CEEEEEEEESEEEEEEECCCSHHHHEECCC

Predicted Relative Solvent Accessiblity (at 25% exposed threshold):
ee---ee-eeee-e-e-eeeee-ee-eeee

Predicted Relative Solvent Accessiblity (20 Class):
0%   eeeeeeeeeeeeeeeeeeeeeeeeeeeeee
5%   eeeeeeeeeeeeeeeeeeeeeeeeeeeeee
10%  eeeeeeeeeeeeeeeeeeeeeeeeeeeeee
15%  eee--eeeeeee-e-eeeeeeeeeeeeeee
20%  eee--ee-eeee-e-eeeeeeeeee-eeee
25%  eee--ee-eeee-e-e-eeeeeeee-eeee
30%  ee---ee-eeee-e-e-eeeee-ee-eeee
35%  ee---ee-eeee-e-e-eeeee-ee-eeee
40%  ee---ee-eeee-e-e-eeeee-ee-eeee
45%  ee---e--eee----e---ee--ee-eeee
50%  ee--------e--------e---ee-eeee
55%  e----------------------e---eee
60%  e--------------------------eee
65%  e---------------------------ee
70%  -----------------------------e
75%  ----------------------------e
80%  -----------------------------e
85%  -----------------------------e
90%  ------------------------------
95%  ------------------------------
100% ------------------------------

Predicted Contact Number:
------------------------------

Predicted Disordered Residues:
OOOOOOOOOOOOOOOOOOOOOOOOOOOODD

Predicted Disorder Probability:
0.16 0.07 0.04 0.04 0.03 0.02 0.02 0.01 0.02 0.02 0.02 0.01 0.01 0.01 0.02 0.02 0.03 0.04 0.06 0.06 0.12 0.13 0.20 0.17 0.18 0.20 0.23 0.48 0.55 0.58

Predicted Domains:
Domain 1: 1 - 30

Predicted Disulfide Bonds:
Input sequence has LESS THAN TWO cysteins and therefore cannot form disulfide bonds.

Predicted Contact Maps:
SEE ATTACHMENTS

Predicted Solubility upon Overexpression:
SOLUBLE with probability 0.901803

Predicted Capsid/Tail Sequence:
Capsid Sequence : YES (distance = 0.266294)
Tail Sequence   : NO  (distance = -0.344029)

The predictions have the following meaning:

Note: Since CMAPpro and 3Dpro predictions are computationally intensive only proteins of length at most 400 amino acids will be accepted if CMAPpro or 3Dpro predictions are selected.

Return to SCRATCH

References

For the server and SSpro/ACCpro 4.0 software package, please refer to:

J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH: a Protein Structure and Structural Feature Prediction Server, Nucleic Acids Research, vol. 33 (web server issue), w72-76, 2005. [PDF] [PDF at NAR website] [Download SSpro/ACCpro 4.0]


For the previous version of the SSpro/ACCpro 5.2 software package, please refer to:

C.N. Magnan & P. Baldi, "SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity", Bioinformatics, 30 (18), 2592-2597, 2014.
Download PDF Abstract & HTML (Bioinformatics Website)

For an explanation of the methods used in SSpro and SSpro8 see:

G.Pollastri, D.Przybylski, B.Rost, P.Baldi, "Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles", Proteins, 47, 228-235, 2002.
Download PDF, Abstract and HTML (Proteins web site).

For an explanation of the methods used in ACCpro and CONpro see:

G.Pollastri, P.Baldi, P.Fariselli, R.Casadio, "Prediction of Coordination Number and Relative Solvent Accessibility in Proteins", Proteins, 47, 142-153, 2002.
Download PDF, Abstract and HTML (Proteins web site)

For an explanation of the methods used in DOMpro see:

J. Cheng, M. Sweredoski, P. Baldi, "DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks", Knowledge Discovery and Data Mining, vol. 13, no. 1, pp. 1-10, 2006.
Download PDF,

For an explanation of the methods used in DISpro see:

J. Cheng, M. Sweredoski, P. Baldi, "Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data", Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 213-222, 2005.
Download PDF, PDF at DAMI web site

For an explanation of the methods used in DIpro see:
J. Cheng, H. Saigo, P. Baldi, "Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching", Proteins, vol. 62, no. 3, pp. 617-629, 2006. [PDF][PDF at Proteins website] [Download DIpro 2.0]
Or
P.Baldi, J. Cheng, A. Vullo, "Large-Scale Prediction of Disulphide Bond Connectivity", Advances in Neural Information Processing Systems (NIPS 2004) 17,L. Saul ,Y. Weiss, and L. Bottou editors, MIT press, pp.97-104, Cambridge, MA, 2005.
Download PDF,

For an explanation of the methods used in CMAPpro see:

P. Di Lena, K. Nagata, P. Baldi, "Deep Architectures for Protein Contact Map Prediction", Bioinformatics, 2012. In press
Download PDF,HTML abstract (Bioinformatics web site)

P. Di Lena, K. Nagata, P. Baldi, "Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction", Neural Information Processing Systems (NIPS), 2012. Accepted for presentation.

G.Pollastri, P.Baldi, "Prediction of Contact Maps by Recurrent Neural Network Architectures and Hidden Context Propagation from All Four Cardinal Corners", Bioinformatics, 18 Suppl 1, S62-S70 (2002).
Download PDF, HTML abstract (Bioinformatics web site).

For an explanation of the methods used in SVMcon see:

J. Cheng and P. Baldi. "Improved Residue Contact Prediction Using Support Vector Machines and a Large Feature Set." BMC Bioinfomatics. 8:113, 2007.
Download [Download SVMcon 1.0]

For an explanation of the methods used in COBEpro:

Michael J. Sweredoski and Pierre Baldi. "COBEpro: a novel system for predicting continuous B-cell epitopes." Protein Engineering Design and Selection 2008; doi: 10.1093/protein/gzn075
Download PDF, HTML (Bioinformatics website)

Return to SCRATCH