This site contains data and figures from the publication Large-scale design and refinement of stable proteins using sequence-only models.About this page
Download data from paper
The experimental stability file contains protein primary sequences, measured stability scores, predicted stability scores, and a range of metadata.
The UniProt proteins file contains the set of short natural proteins used to train the Evaluator Model (EM) in the paper
Success of EM Predictions
Figure 2 (B), part 1: Success of Evaluator Model (EM) predictions on a library of new designs. The EM was used to predict the stability of 45,840 new protein sequences that the model had not seen before.
Empirically Stable Designs
Figure 2. (B), part 2: Fraction of designs that were empirically stable (stability score > 1.0) as a function of the model's a priori stability predictions (dotted grey line: stability threshold for predicted stability).
Stability Scores: Predicted versus Observed
Figure 2. (C): Predicted versus observed stability scores for the library of new designs.