September 12, 2023

Folding Playground

Introduction

Ever wondered if ESMFold or OmegaFold outperforms AlphaFold2 for your protein of interest? At Sphinx, we believe that every scientist should have the tools to answer exactly those types of questions. And we’ve been building tools to do just that. Today we’re excited to announce the launch of our Folding Playground.

A comparison of folding models on hemoglobin subunit epsilon

Want to get started without reading the rest of the post? Try it yourself!

Background

We’ll start with some basics, so if you’re a folding pro — feel free to skip ahead.

Predicting a protein's 3D structure based on its amino acid sequence has been a long-standing challenge in biology. While numerous computational methods attempted to solve this problem for decades, it wasn’t until the publication of AlphaFold that a breakthrough was reached. Its successor, AlphaFold2, achieved unprecedented — near experimental — accuracy in the 2020 Critical Assessment of Structure Prediction (CASP) competition. After this achievement, numerous other models were proposed, each with their own tradeoffs.

One of AlphaFold’s crucial advances was using Multiple Sequence Alignment (MSA) to integrate evolutionary information into its predictions. AlphaFold's use of MSA allows it to pull in data from similar proteins, enriching the model with context that mimics natural evolutionary processes. This additional layer of information tends to produce highly accurate results but can be computationally expensive. In contrast, models such as ESMFold rely solely on the amino acid sequence of the target protein for its predictions. This makes ESMFold faster to run, but it might sacrifice some of the intricate details that models leveraging MSAs can capture. We’ll explore some of these tradeoffs later.

An additional advance introduced by AlphaFold was the creation of pLDDT (predicted Local Distance Difference Test), a metric that quantifies the model’s confidence level in its predictions. pLDDT is a measure of how closely the predicted inter-atomic distances match the experimentally observed distances for each residue in a protein. Scores range from 0 to 100, with higher scores indicating higher confidence in the predictions. While pLDDT was originally intended to provide scientists with a means to assess the quality of a model’s predictions, some papers have started to use it as a way to select which proteins should be tested.

The problem

As we started working with more scientists focused on designing proteins and antibodies, it became evident that there was no easy way to determine which folding model to use for a specific research problem. Scientists often resorted to using whichever model is recommended in the most recent paper they’ve read, but this approach didn’t ensure optimal outcomes for their specific problem. Despite a growing body of research comparing various folding models on different classes of proteins, there was no easy to use tool for comparing models.

Interactive tools are critical to helping scientists compare model outputs — and when it comes to protein structure, nothing replaces looking at the 3-D rendering. Yet while there are plenty of options to run an individual folding model (such as the fantastic ColabFold), we couldn’t find a good solution that allowed scientists to easily run multiple models.

More importantly, most existing tools don’t support easy comparisons between models. It is important for scientists to have both the metrics output by the models (such as pLDDT) as well as an easy way to look at the predicted protein structures directly.

There was no way to easily, reproducibly, and quickly run many different folding models against a protein of interest and determine which model works best for you… until now.

Deploying ML effectively in bio extends beyond just protein folding. If you’re interested in interactive interfaces for model results and making data driven decisions about which one works best for your specific problem — let’s chat.

Our Solution

At Sphinx, this problem was a perfect fit for our core mission of helping scientists make better decisions, faster. That’s why we're publicly releasing the Folding Playground, a user-friendly platform tailored for scientists at all levels of computational expertise. The Folding Playground is an interactive environment where you can easily compare the performance of state of the art folding algorithms — like AlphaFold2 ESMFold, OmegaFold, and more — on your protein sequence of interest. We allow scientists to run multiple models at once without ever writing a single line of code (or reach out to us for API access).

The interface is simple: add your amino acid sequence of interest, select which models you’d like to compare, then click run! Once the structures have been predicted, we’ll give you an interactive interface to compare predictions. You can color each structure by pLDDT to see the confidence of the model at each residue.

Folding Playground view of Adalimumab

That’s not all — since this is integrated into the greater Sphinx platform, you can easily launch a no-code analysis notebook to investigate the metrics that are output by each model. This lets you quickly build visualizations to understand the overall model’s confidence — and whether it matches your intuition from the playground.

A comparison of pLDDTs across models for CAMP

Case Studies

To demonstrate our platform, we used the Folding Playground to fold proteins from a range of classes:

Adalimumab (Humira): one of the best selling antibodies of all time
CAMP: a highly disordered immune protein
Hemoglobin (epsilon subunit): part of a well characterized protein
TMB2_16_1: a membrane protein used previously used to compare ESMFold and Alphafold
Top7: a de novo designed protein

We ran AlphaFold, ESMFold, OmegaFold, and OpenFold on each protein. The results are below:

From left to right: Adalimumab, CAMP, Hemoglobin, TMB2_16_1, Top7

A comparison of pLDDTs across different models and proteins

OmegaFold and ESMFold — both models without MSAs — were consistently less confident in their predictions than AlphaFold and OpenFold. OpenFold and Alphafold were more confident and mostly in agreement — except for a few key loops in the transmembrane protein. So while ESMFold and OmegaFold are much faster to run, they might not be “good enough” for your use case.

We deliberately selected a wide variety of protein classes, but we encourage you to do your own exploration.

Curious how each model performs on your protein and ready to get folding? You can sign up for free here: https://app.sphinxbio.com/signup. Or reach out if you’d like to discuss how to integrate your own models, metrics, and experimental data.

- Nicholas

P.S. Want to help democratize ML for scientists? We’re hiring.

Additional Resources

April 1, 2025

Rational Drowning

Most biotechs delay improving data practices until it's too late—leading to lost data, redundant experiments, and costly mistakes. Sphinx makes it easy to implement good data hygiene from day one, combining flexibility for early-stage iteration with automation and consistency as you scale.

March 27, 2025

Recency Bias

Biotech teams often analyze datasets individually, missing deeper insights by neglecting older data due to integration complexity. Automated tools now allow bench scientists to seamlessly combine diverse datasets into structured tables without manual coding, unlocking comprehensive analysis and better-informed decisions.