Repurposing Protein Folding Models for Generation with Latent Diffusion

“`html Repurposing Protein Folding Models for Generation with Latent Diffusion From Structure Prediction to Real-World Drug Design The recent Nobel Prize in…

By AI Maestro May 10, 2026 2 min read
Repurposing Protein Folding Models for Generation with Latent Diffusion

“`html




Repurposing Protein Folding Models for Generation with Latent Diffusion

From Structure Prediction to Real-World Drug Design

The recent Nobel Prize in Chemistry awarded to AlphaFold2 highlights the transformative role of AI in biology. What comes next after protein folding?

Generating “useful” proteins

PLAID, a multimodal generative model, learns from the latent space of protein folding models to generate both sequence and structure simultaneously. It can accept compositional function and organism prompts and is trained on larger sequence databases.

Limitations of existing models

  • All-atom generation: Many current models only output backbone atoms, requiring sidechain placement which is sequence-dependent. This creates a multimodal problem that PLAID solves by generating both sequence and structure.
  • Organism specificity: Humanized proteins are needed for biological applications to avoid immune rejection. PLAID addresses this by training on human-specific sequences.
  • Control specification: Complex constraints like solubility and ease of transport can be specified, allowing for the design of proteins tailored for specific uses.

Training using sequence-only training data

A key aspect of PLAID is its ability to train a generative model solely on sequence databases. Sequence information is more abundant and cheaper to obtain compared to structural data, making this approach feasible.

How does it work?

  1. PLAID learns over the latent space of a protein folding model, using frozen weights from ESMFold for structure generation during inference.
  2. The method addresses the need for all-atom structures by leveraging structural understanding encoded in the weights of pretrained models like ESMFold.

Compressing the latent space of protein folding models

To make this approach more scalable, we propose CHEAP, a compression model that learns to represent protein sequence and structure in a compressed form.

If you’ve found our papers useful, consider citing:

@article{lu2024generating,
title={Generating All-Atom Protein Structure from Sequence-Only Training Data},
author={Lu, Amy X and Yan, Wilson and Robinson, Sarah A and Yang, Kevin K and Gligorijevic, Vladimir and Cho, Kyunghyun and Abbeel, Pieter and Frey, Nathan},
journal={bioRxiv},
pages={2024–12},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
@article{lu2024tokenized,
title={Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure},
author={Lu, Amy X and Yan, Wilson and Yang, Kevin K and Gligorijevic, Vladimir and Cho, Kyunghyun and Abbeel, Pieter and Bonneau, Richard and Frey, Nathan},
journal={bioRxiv},
pages={2024–08},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}

You can also check out our preprints (PLAID, CHEAP) and codebases (PLAID, CHEAP).

What’s Next?

We can adapt this method to perform multimodal generation for any modalities where there is a predictor from an abundant modality to a less abundant one. As sequence-to-structure predictors like AlphaFold3 are tackling increasingly complex systems, it’s easy to imagine using the same method for more sophisticated applications.

Some bonus protein generation fun!

Here are some additional examples of PLAID-generated proteins:

  • Function-prompted generations with PLAID
  • Unconditional generation with PLAID
  • Comparing samples between PLAID and all-atom baselines

Acknowledgements

Thanks to Nathan Frey for detailed feedback on this article, and co-authors across BAIR, Genentech, Microsoft Research, and New York University: Wilson Yan, Sarah A. Robinson, Simon Kelow, Kevin K. Yang, Vladimir Gligorijevic, Kyunghyun Cho, Richard Bonneau, Pieter Abbeel, and Nathan C. Frey.

“`

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top