This site contains data and figures from the publication Large-scale design and refinement of stable proteins using sequence-only models.

About this page

Selected Figures

Download data from paper

The experimental stability file contains protein primary sequences, measured stability scores, predicted stability scores, and a range of metadata.

The UniProt proteins file contains the set of short natural proteins used to train the Evaluator Model (EM) in the paper

Success of EM Predictions

Figure 2 (B), part 1: Success of Evaluator Model (EM) predictions on a library of new designs. The EM was used to predict the stability of 45,840 new protein sequences that the model had not seen before.

Empirically Stable Designs

Figure 2. (B), part 2: Fraction of designs that were empirically stable (stability score > 1.0) as a function of the model's a priori stability predictions (dotted grey line: stability threshold for predicted stability).

Stability Scores: Predicted versus Observed

Figure 2. (C): Predicted versus observed stability scores for the library of new designs.