mirror of https://github.com/NousResearch/atropos.git synced 2026-04-19 12:57:58 +00:00

History

Shannon Sands ec2b6f093d linting		2025-05-27 12:29:10 +10:00
..
configs	linting	2025-05-27 12:15:15 +10:00
models	linting	2025-05-27 12:15:15 +10:00
utils	linting	2025-05-27 12:15:15 +10:00
.env.example	linting	2025-05-27 12:15:15 +10:00
debug_target.pdb	linting	2025-05-27 12:15:15 +10:00
prompts.py	linting	2025-05-27 12:15:15 +10:00
protein_env.py	linting	2025-05-27 12:29:10 +10:00
README.md	linting	2025-05-27 12:15:15 +10:00
tool_definitions.py	linting	2025-05-27 12:15:15 +10:00
tool_executor.py	linting	2025-05-27 12:15:15 +10:00

README.md

🧬 LLM-Guided De Novo Protein Design Environment

De novo protein binder design is one of the hardest problems in bioengineering: you're tasked with inventing an amino acid sequence that folds into a 3D structure that binds to a given target protein. This environment lets Large Language Models (LLMs) tackle that problem using reinforcement learning (RL) — not by predicting sequences blindly, but by learning to use the right tools in the right order to produce functioning binders.

🤖 Why LLM-based RL Instead of Classic RL?

Classic RL works well for Atari, but it could never work for de novo protein binder design. Why?

Simulation is slow. Each step—AlphaFold, RFdiffusion, ProteinMPNN—can take minutes. You don’t get to run millions of episodes like in classic RL.
State/action spaces are vast and weird. Proteins are not 2D boards or pixel arrays. Designing them involves sequences, structures, config files, hotspots, and domain hacks.
Heuristics and intuition matter. LLMs are pretrained on a world model—language, code, protein sequences, scientific papers. They come in with baked-in priors that help them reason, even under sparse rewards.

Classic RL policy networks? They’d need to learn everything from scratch, which is impossible!

🧪 The Protein Design Pipeline

Each episode consists of an LLM navigating a 4-step design pipeline, using state-of-the-art tools as function calls:

Step 1: Target Sequence → Structure (`AlphaFold`)

Input: Target protein sequence
Output: 3D .pdb file (structure)
Reward: Format validity

Step 2: Target Structure → Binder Backbone (`RFdiffusion`)

Input: .pdb file of target
Output: .pdb backbone of potential binder
Reward: Format validity

Step 3: Backbone → Full Binder Sequence (`ProteinMPNN`)

Input: Binder backbone
Output: .fasta with side chains
Reward: Format validity

Step 4: Evaluate Binding (`AlphaFold-Multimer`)

Input: Target + binder sequences
Output: Complex structure prediction
Reward:
- Format OK
- No steric clashes
- Bonus: Contact interface, binding affinity metrics (Not yet implemented)

🏆 Reward Function

The reward is cumulative:

+0.2: Successfully generate output in correct format at each step
+0.0 to +1.0: Structural reward based on complex validity smoothly interpolated on AlphaFold2 multimere confidence
+1: High predicted binding affinity (Not yet implemented)

Sparse, but real. LLMs must plan tool use, not just spam actions.

🔧 Setup

Access to hosted NVIDIA APIs:

NVIDIA_NIM_API_KEY="YOUR_API_KEY"

README.md Unescape Escape