Methodology
- DOMpro is a web server to predict protein domain boundaries using 1D-Recursive Neural Networks and statistical methods. To utilize the evolutionary information encoded in homologous proteins, profiles generated from the NR database by PSI-BLAST are used as inputs. The predicted secondary structure and solvent accessibility are also fed into the neural networks to improve predictions.
- The protein domains are predicted in two steps. Step 1: Neural networks classify residues into two categories: boundary residues or non-boundary residues. Step 2: Proteins are cut into domains according to domain boundary predictions in step 1.
Limitation
- Currently the implementation of the web server can only cut proteins into at most three domains according to domain boundary predictions. This is usually not enough for very long proteins (>600 residues). To overcome this limit, you can download DOMpro software to make domain boundary predictions and then manually infer domains from the domain bounday predictions.
- This server is an ab initio domain prediction server. We are working on adding homology modeling to improve domain predictions and will incorporate it into the server in future. Currently, for proteins having significant similarity with proteins in the PDB, you can use the NCBI Conserved Domain Database search tool.
Input Format
- Target name, optional, which is used to identify your query.
- Email address, where the prediction result is sent.
- Sequence: a raw text of sequence, white spaces are ignored.
Output Format
- Domain Boundary predictions are presented in CASP format. Here is an example with descriptions:
PFRMAT DP
TARGET Query-Name
AUTHOR baldi-group-server
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
1 H 1 0.90 #Format: residue index, residue name, predicted domain index, confidence score of prediction.
2 L 1 0.90 #In this example, two predicted domains: Residues 1-9 are in domain 1, others in domain 2.
3 E 1 0.90
4 G 1 0.90
5 S 1 0.90
6 I 1 0.90
7 G 1 0.60
8 I 1 0.60
9 L 1 0.80
10 L 2 0.80
11 K 2 0.90
12 K 2 0.90
13 H 2 0.90
14 E 2 0.90
15 I 2 0.90
16 V 2 0.75
17 F 2 0.60
18 D 2 0.90
19 G 2 0.90
20 C 2 0.90
END
Performance
-
DOMpro's performance in the CAFASP4 (Evaluated by CAFASP team). In the CAFASP4 evaluation, DOMpro is ranked first among the ab-initio domain predictors (Armadillo, Biozon, Dompred-DPS, Globplot, Mateo, DOMpro), while the performance of the top three (DOMpro, Globplot, Dompred-DPS) is close.
-
On our own testing dataset, DOMpro correctly predicted the number of domains for 69% of the combined dataset of single and multi-domain proteins. 79% of the single domain proteins were correctly predicted as having no boundaries. The number of domains is correctly predicted for 43% of the multi-domain proteins.
Reference
-
J. Cheng, M. Sweredoski, P. Baldi. DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks. Knowledge Discovery and Data Mining, vol. 13, no. 1, pp. 1-10, 2006. [PDF]
Download