Frequently Asked Questions

What is Robetta?

Robetta is a protein structure prediction server developed by the Baker lab at the University of Washington. At it's core is the Rosetta macromolecular modeling suite developed by the Rosetta Commons, a multi-institutional collaborative research and software development group. Robetta's primary service is to predict the 3-dimensional structure of a protein given the amino acid sequence.

Five options are provided for structure prediction: (1) A deep learning based method, RoseTTAFold, (2) A deep learning based method, TrRosetta, (3) Rosetta Comparative Modeling (RosettaCM), (4) Rosetta Ab Initio (RosettaAB), and (5) a fully automated pipeline that first predicts domains as independent folding units, models each unit with (3) or (4), and then assembles them into full chain models.

For the RosettaCM protocol, 4 independent methods are used to detect templates and generate sequence alignments:

RaptorX (Xu lab)
Morten Källberg, Haipeng Wang, Sheng Wang, Jian Peng, Zhiyong Wang, Hui Lu & Jinbo Xu. Template-based protein structure modeling using the RaptorX web server. Nature Protocols 7, 1511-1522, 2012.
HHpred (Söding lab)
Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960. doi:10.1093/bioinformatics/bti125.
Sparks-X (Zhou lab)
Yuedong Yang, Eshel Faraggi, Huiying Zhao, Yaoqi Zhou. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of the query and corresponding native properties of templates. Bioinformatics 27:2076-82(2011)
Map align (Ovchinnikov lab)
S Ovchinnikov et al. Protein Structure Determination using Metagenome sequence data. (2017) Science. 355(6322):294–8.

Users have the option to forgo these methods by providing their own template(s) and alignment(s), and to optionally modify the alignment(s), add custom constraints, and more through an interactive interface on the submit page.

Computing resources for this service are provided by the Baker lab through a local cluster, HHMI's Janelia Research Campus through a satellite gpu cluster, and by volunteers from the distributed computing project, Rosetta@home.

How accurate is Robetta?

Robetta is continually evaluated through CAMEO (Continuous Automated Model EvaluatOn) which evaluates up to 20 pre-released Protein Data Bank (PDB) targets each week. Through a 1 year period and over 900 CAMEO pre-released PDB targets, Robetta averaged an lDDT (Local Distance Difference Test) score of around 69. lDDT is a superposition independent score that evaluates interatomic distances with values ranging from 0 (bad) to 100 (good).

Robetta's accuracy mainly depends on whether similar sequences (homologs) exist in available sequence databases (UniProt and Uniclust) and the PDB. A predicted confidence value which takes this into account and was found to correlate with the actual GDT to native is provided for comparative modeling domains and described in the RosettaCM publication's Supplementary Information. For ab initio domains, a predicted confidence value is provided that corresponds to the average pairwise TM-score of the top 10 Rosetta scoring models and is described in the Supplementary Materials for this publication.

Per residue local error estimates are provided in a plot and the b-factor column of model coordinates as predicted distances (in Å) between the Cα positions of the model compared to the native structure. You may color models by the error estimate and download coordinates with a range of less than 1 to 5 Å error. The error estimates are continually evaluated through CAMEO and typically average a model confidence score of 0.85 and a model confidence lDDT score of 0.82. The latter is superposition independent. The error estimates are based on structural variation within model clusters and therefore are not calculated if only 1 model is selected for sampling.

How do I submit a job to Robetta?

In order to submit to Robetta, you first need to register. Once you have an account and login, you can access the submit page. Through an interactive interface on the submit page, you can paste or upload your protein's amino acid sequence, upload and modify templates and alignments for comparative modeling, add custom distance constraints, and more.

How long does a job take to run?

It typically takes one to a few days to complete a job, but there are many factors that may affect the run time such as the length of your sequence, the number of predicted domains, and the number of jobs that are already queued and active. If you use TrRosetta or provide your own template(s) and alignment(s) for comparative modeling, it may take less than one day. For very large, multi-domain domain predictions, it may take over a week and requires manual intervention as described below.

To help prevent exceptionally long queues from occuring, users are only allowed to model one domain at a time. If you choose to "Predict domains", you must manually select the domain you want to model after the domain boundaries have been predicted.

What do the results look like?

An example is available to view here.

How do I get my results?

Results become available throughout the modeling pipeline process and can be accessed from the My queue page by clicking on the Job ID of interest. A gzipped tar file of the raw results and inputs which may include PDB templates, alignments, fragment files, constraints, commands, models, and more is available for each completed domain from the "Download Results Archive" link at the top of the domain results page.

My queue -> Job id -> Domain id -> Download Results Archive

Additionally, models are emailed to your registered email address when they become available.

What is an Error job status?

For rare cases when there is an unrecoverable error, usually due to corrupted user input data, jobs will be marked with an Error status. If you would like details about the error or think that it may be due to a bug in our pipeline, please contact us.

Can I use my own template for comparative modeling?

Yes, Robetta provides an interactive interface on the submit page where you can upload your own PDB template coordinates and an optional alignment. The alignment should be global, in FASTA format, and placed before the PDB template coordinates as shown in this example. You can load multiple templates, modify the alignments, add constraints, and more before submitting.

Can Robetta model multi-chain complexes?

Yes, Robetta models homo-oligomeric complexes if it detects symmetry from a template's biological unit when using the comparative modeling (CM) method. More information is provided in this publication.

Robetta can also model hetero-oligomeric complexes but only with the RoseTTAFold and Comparative Modeling (CM) options by using a forward slash "/" between chain sequences when providing your sequence on the submit page. Complexes can be modeled using CM or RoseTTAFold only; docking and ab initio methods are not currently used.

For RoseTTAFold, a multiple sequence alignment of chain sequences paired for optimal co-evolution signal must be provided in A3M format. Alignments should NOT include a forward slash between chain sequences. For more information, please visit https://github.com/RosettaCommons/RoseTTAFold/tree/main/example/complex_modeling.

For CM, template(s) and alignment(s) may be provided by the user, and custom inter-chain constraints may be applied. Alignments should also include a forward slash "/" between chain sequences. Please note that comparative modeling hetero-oligomeric complexes has not been thoroughly tested and benchmarked.

Can Robetta model protein-ligand complexes?

Currently, Robetta cannot model protein ligand complexes. However, protein ligand complexes can be loaded as templates into the submit page and custom constraints can easily be applied to the positions that interact with the ligand to constrain the binding site. Ligands are omitted upon submitting.

What happened to the Fragment Library, Alanine Scanning, and DNA Interface Residue Scanning servers?

These servers are available at http://old.robetta.org.

How do I reference Robetta?

RoseTTAFold

Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N. Kinch, R. Dustin Schaeffer, Claudia Millán, Hahnbeom Park, Carson Adams, Caleb R. Glassman, Andy DeGiovanni, Jose H. Pereira, Andria V. Rodrigues, Alberdina A. van Dijk, Ana C. Ebrecht, Diederik J. Opperman, Theo Sagmeister, Christoph Buhlheller, Tea PavkovKeller, Manoj K Rathinaswamy, Udit Dalwadi, Calvin K Yip, John E Burke, K. Christopher Garcia, Nick V. Grishin, Paul D. Adams, Randy J. Read, David Baker. Accurate prediction of protein structures and interactions using a 3-track network. Science 10.1126/science.abj8754 (2021); doi: https://doi.org/10.1126/science.abj8754.

TrRosetta and DeepAccNet

Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, and David Baker. Improved protein structure prediction using predicted interresidue orientations. (2020) PNAS 117 (3) 1496-1503.
Naozumi Hiranuma, Hahnbeom Park, Ivan Anishchanka, Minkyung Baek, David Baker. Improved protein structure refinement guided by deep learning based accuracy estimation. (2020) bioRxiv preprint. https://doi.org/10.1101/2020.07.17.209643.

Rosetta Comparative Modeling and Ab Initio Modeling

Yifan Song, Frank DiMaio, Ray Yu-Ruei Wang, David Kim, Chris Miles, TJ Brunette, James Thompson and David Baker. High resolution comparative modeling with RosettaCM. Structure. 2013 Oct 8;21(10):1735-42.
Srivatsan Raman, Robert Vernon, James Thompson, Michael Tyka, Ruslan Sadreyev,Jimin Pei, David Kim, Elizabeth Kellogg, Frank DiMaio, Oliver Lange, Lisa Kinch, Will Sheffler, Bong-Hyun Kim, Rhiju Das, Nick V. Grishin, and David Baker. Structure prediction for CASP8 with all-atom refinement using Rosetta. (2009) Proteins 77 Suppl 9:89-99.

Alignment Methods (Generously provided and supported by the following groups)

RaptorX (Xu lab)
Morten Källberg, Haipeng Wang, Sheng Wang, Jian Peng, Zhiyong Wang, Hui Lu & Jinbo Xu. Template-based protein structure modeling using the RaptorX web server. Nature Protocols 7, 1511-1522, 2012.
HHpred (Söding lab)
Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960. doi:10.1093/bioinformatics/bti125.
Sparks-X (Zhou lab)
Yuedong Yang, Eshel Faraggi, Huiying Zhao, Yaoqi Zhou. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of the query and corresponding native properties of templates. Bioinformatics 27:2076-82(2011)
Map align (Ovchinnikov lab)
S Ovchinnikov et al. Protein Structure Determination using Metagenome sequence data. (2017) Science. 355(6322):294–8.
NW-align (Zhang lab)