mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-29 17:35:07 +00:00
[env]: add initial ProteinBinderEnv
Co-authored-by: based-tachikoma <based.tachikoma@gmail.com>
This commit is contained in:
parent
c189fc3351
commit
4d9bec44c6
4 changed files with 1245 additions and 0 deletions
3
environments/hack0/protein_design_env/.env.example
Normal file
3
environments/hack0/protein_design_env/.env.example
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
# We use NVIDIA NIM to access hosted models on the API
|
||||
|
||||
NVIDIA_NIM_API_KEY: "YOUR API KEY"
|
||||
64
environments/hack0/protein_design_env/README.md
Normal file
64
environments/hack0/protein_design_env/README.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# 🧬 LLM-Guided De Novo Protein Design Environment
|
||||
|
||||
**De novo protein binder design** is one of the hardest problems in bioengineering: you're tasked with inventing an amino acid sequence that folds into a 3D structure that binds to a given target protein. This environment lets **Large Language Models (LLMs)** tackle that problem using reinforcement learning (RL) — not by predicting sequences blindly, but by *learning to use the right tools in the right order* to produce functioning binders.
|
||||
|
||||
---
|
||||
|
||||
## 🤖 Why LLM-based RL Instead of Classic RL?
|
||||
|
||||
Classic RL works well for Atari, but it could never work for de novo protein binder design. Why?
|
||||
|
||||
- **Simulation is slow.** Each step—AlphaFold, RFdiffusion, ProteinMPNN—can take minutes. You don’t get to run millions of episodes like in classic RL.
|
||||
- **State/action spaces are vast and weird.** Proteins are not 2D boards or pixel arrays. Designing them involves sequences, structures, config files, hotspots, and domain hacks.
|
||||
- **Heuristics and intuition matter.** LLMs are pretrained on a *world model*—language, code, protein sequences, scientific papers. They come in with baked-in priors that help them reason, even under sparse rewards.
|
||||
|
||||
**Classic RL policy networks?** They’d need to learn everything from scratch, which is impossible!
|
||||
|
||||
---
|
||||
|
||||
## 🧪 The Protein Design Pipeline
|
||||
|
||||
Each episode consists of an LLM navigating a 4-step design pipeline, using state-of-the-art tools as function calls:
|
||||
|
||||
### Step 1: Target Sequence → Structure (`AlphaFold`)
|
||||
- **Input:** Target protein sequence
|
||||
- **Output:** 3D `.pdb` file (structure)
|
||||
- **Reward:** Format validity
|
||||
|
||||
### Step 2: Target Structure → Binder Backbone (`RFdiffusion`)
|
||||
- **Input:** `.pdb` file of target
|
||||
- **Output:** `.pdb` backbone of potential binder
|
||||
- **Reward:** Format validity
|
||||
|
||||
### Step 3: Backbone → Full Binder Sequence (`ProteinMPNN`)
|
||||
- **Input:** Binder backbone
|
||||
- **Output:** `.fasta` with side chains
|
||||
- **Reward:** Format validity
|
||||
|
||||
### Step 4: Evaluate Binding (`AlphaFold-Multimer`)
|
||||
- **Input:** Target + binder sequences
|
||||
- **Output:** Complex structure prediction
|
||||
- **Reward:**
|
||||
- Format OK
|
||||
- No steric clashes
|
||||
- **Bonus:** Contact interface, binding affinity metrics (Not yet implemented)
|
||||
|
||||
---
|
||||
|
||||
## 🏆 Reward Function
|
||||
|
||||
The reward is cumulative:
|
||||
- **+0.2**: Successfully generate output in correct format at each step
|
||||
- **+0.0 to +1.0:** Structural reward based on complex validity smoothly interpolated on AlphaFold2 multimere confidence
|
||||
- **+1**: High predicted binding affinity (Not yet implemented)
|
||||
|
||||
Sparse, but real. LLMs must *plan* tool use, not just spam actions.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Setup
|
||||
|
||||
Access to hosted NVIDIA APIs:
|
||||
```env
|
||||
NVIDIA_NIM_API_KEY="YOUR_API_KEY"
|
||||
```
|
||||
|
|
@ -0,0 +1,33 @@
|
|||
# NVIDIA NIM Environment Default Configuration for BinderBench
|
||||
|
||||
# Debug Mode - set to true to use mock data instead of actual API calls
|
||||
debug_protein_design_calls: false
|
||||
|
||||
# Retry settings for failed steps
|
||||
max_retries_per_internal_step: 100 # Increased to allow many retries for tool calls
|
||||
|
||||
# API Settings
|
||||
# nim_api_key is loaded from .env file using NVIDIA_NIM_API_KEY
|
||||
nim_api_base_url: "https://health.api.nvidia.com/v1"
|
||||
api_timeout: 600
|
||||
polling_interval: 10
|
||||
|
||||
# Protein Design Settings
|
||||
output_dir: "environments/hack0/protein_design_env/outputs"
|
||||
|
||||
# WandB tracking settings
|
||||
use_wandb: true
|
||||
wandb_name: "binderbench"
|
||||
wandb_project: "atropos" # Will default to this if not specified
|
||||
include_messages: true # Include messages in WandB logs
|
||||
|
||||
# Dataset configuration
|
||||
dataset_name: "ronig/protein_binding_sequences"
|
||||
target_col: "receptor"
|
||||
binder_col: "peptide"
|
||||
|
||||
# Scoring weights for final complex quality
|
||||
metric_weights:
|
||||
plddt: 0.3
|
||||
ptm: 0.3
|
||||
iptm: 0.4
|
||||
1145
environments/hack0/protein_design_env/protein_env.py
Normal file
1145
environments/hack0/protein_design_env/protein_env.py
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue