atropos/environments/community/protein_design
Shannon Sands ec2b6f093d linting
2025-05-27 12:29:10 +10:00
..
configs linting 2025-05-27 12:15:15 +10:00
models linting 2025-05-27 12:15:15 +10:00
utils linting 2025-05-27 12:15:15 +10:00
.env.example linting 2025-05-27 12:15:15 +10:00
debug_target.pdb linting 2025-05-27 12:15:15 +10:00
prompts.py linting 2025-05-27 12:15:15 +10:00
protein_env.py linting 2025-05-27 12:29:10 +10:00
README.md linting 2025-05-27 12:15:15 +10:00
tool_definitions.py linting 2025-05-27 12:15:15 +10:00
tool_executor.py linting 2025-05-27 12:15:15 +10:00

🧬 LLM-Guided De Novo Protein Design Environment

De novo protein binder design is one of the hardest problems in bioengineering: you're tasked with inventing an amino acid sequence that folds into a 3D structure that binds to a given target protein. This environment lets Large Language Models (LLMs) tackle that problem using reinforcement learning (RL) — not by predicting sequences blindly, but by learning to use the right tools in the right order to produce functioning binders.


🤖 Why LLM-based RL Instead of Classic RL?

Classic RL works well for Atari, but it could never work for de novo protein binder design. Why?

  • Simulation is slow. Each step—AlphaFold, RFdiffusion, ProteinMPNN—can take minutes. You dont get to run millions of episodes like in classic RL.
  • State/action spaces are vast and weird. Proteins are not 2D boards or pixel arrays. Designing them involves sequences, structures, config files, hotspots, and domain hacks.
  • Heuristics and intuition matter. LLMs are pretrained on a world model—language, code, protein sequences, scientific papers. They come in with baked-in priors that help them reason, even under sparse rewards.

Classic RL policy networks? Theyd need to learn everything from scratch, which is impossible!


🧪 The Protein Design Pipeline

Each episode consists of an LLM navigating a 4-step design pipeline, using state-of-the-art tools as function calls:

Step 1: Target Sequence → Structure (AlphaFold)

  • Input: Target protein sequence
  • Output: 3D .pdb file (structure)
  • Reward: Format validity

Step 2: Target Structure → Binder Backbone (RFdiffusion)

  • Input: .pdb file of target
  • Output: .pdb backbone of potential binder
  • Reward: Format validity

Step 3: Backbone → Full Binder Sequence (ProteinMPNN)

  • Input: Binder backbone
  • Output: .fasta with side chains
  • Reward: Format validity

Step 4: Evaluate Binding (AlphaFold-Multimer)

  • Input: Target + binder sequences
  • Output: Complex structure prediction
  • Reward:
    • Format OK
    • No steric clashes
    • Bonus: Contact interface, binding affinity metrics (Not yet implemented)

🏆 Reward Function

The reward is cumulative:

  • +0.2: Successfully generate output in correct format at each step
  • +0.0 to +1.0: Structural reward based on complex validity smoothly interpolated on AlphaFold2 multimere confidence
  • +1: High predicted binding affinity (Not yet implemented)

Sparse, but real. LLMs must plan tool use, not just spam actions.


🔧 Setup

Access to hosted NVIDIA APIs:

NVIDIA_NIM_API_KEY="YOUR_API_KEY"