[env]: add initial ProteinBinderEnv

Co-authored-by: based-tachikoma <based.tachikoma@gmail.com>
2026-04-29 17:35:07 +00:00 · 2025-05-18 14:38:29 -07:00 · 2025-05-18 14:38:29 -07:00 · 4d9bec44c6
commit 4d9bec44c6
parent c189fc3351
4 changed files with 1245 additions and 0 deletions
--- a/environments/hack0/protein_design_env/.env.example
+++ b/environments/hack0/protein_design_env/.env.example
@ -0,0 +1,3 @@
+# We use NVIDIA NIM to access hosted models on the API
+
+NVIDIA_NIM_API_KEY: "YOUR API KEY"
--- a/environments/hack0/protein_design_env/README.md
+++ b/environments/hack0/protein_design_env/README.md
@ -0,0 +1,64 @@
+# 🧬 LLM-Guided De Novo Protein Design Environment
+
+**De novo protein binder design** is one of the hardest problems in bioengineering: you're tasked with inventing an amino acid sequence that folds into a 3D structure that binds to a given target protein. This environment lets **Large Language Models (LLMs)** tackle that problem using reinforcement learning (RL) — not by predicting sequences blindly, but by *learning to use the right tools in the right order* to produce functioning binders.
+
+---
+
+## 🤖 Why LLM-based RL Instead of Classic RL?
+
+Classic RL works well for Atari, but it could never work for de novo protein binder design. Why?
+
+- **Simulation is slow.** Each step—AlphaFold, RFdiffusion, ProteinMPNN—can take minutes. You don’t get to run millions of episodes like in classic RL.
+- **State/action spaces are vast and weird.** Proteins are not 2D boards or pixel arrays. Designing them involves sequences, structures, config files, hotspots, and domain hacks.
+- **Heuristics and intuition matter.** LLMs are pretrained on a *world model*—language, code, protein sequences, scientific papers. They come in with baked-in priors that help them reason, even under sparse rewards.
+
+**Classic RL policy networks?** They’d need to learn everything from scratch, which is impossible!
+
+---
+
+## 🧪 The Protein Design Pipeline
+
+Each episode consists of an LLM navigating a 4-step design pipeline, using state-of-the-art tools as function calls:
+
+### Step 1: Target Sequence → Structure (`AlphaFold`)
+- **Input:** Target protein sequence
+- **Output:** 3D `.pdb` file (structure)
+- **Reward:** Format validity
+
+### Step 2: Target Structure → Binder Backbone (`RFdiffusion`)
+- **Input:** `.pdb` file of target
+- **Output:** `.pdb` backbone of potential binder
+- **Reward:** Format validity
+
+### Step 3: Backbone → Full Binder Sequence (`ProteinMPNN`)
+- **Input:** Binder backbone
+- **Output:** `.fasta` with side chains
+- **Reward:** Format validity
+
+### Step 4: Evaluate Binding (`AlphaFold-Multimer`)
+- **Input:** Target + binder sequences
+- **Output:** Complex structure prediction
+- **Reward:** 
+  - Format OK
+  - No steric clashes
+  - **Bonus:** Contact interface, binding affinity metrics (Not yet implemented)
+
+---
+
+## 🏆 Reward Function
+
+The reward is cumulative:
+- **+0.2**: Successfully generate output in correct format at each step  
+- **+0.0 to +1.0:** Structural reward based on complex validity smoothly interpolated on AlphaFold2 multimere confidence
+- **+1**: High predicted binding affinity (Not yet implemented)  
+
+Sparse, but real. LLMs must *plan* tool use, not just spam actions.
+
+---
+
+## 🔧 Setup
+
+Access to hosted NVIDIA APIs:
+```env
+NVIDIA_NIM_API_KEY="YOUR_API_KEY"
+```
--- a/environments/hack0/protein_design_env/configs/binderbench_default.yaml
+++ b/environments/hack0/protein_design_env/configs/binderbench_default.yaml
@ -0,0 +1,33 @@
+# NVIDIA NIM Environment Default Configuration for BinderBench
+
+# Debug Mode - set to true to use mock data instead of actual API calls
+debug_protein_design_calls: false
+
+# Retry settings for failed steps
+max_retries_per_internal_step: 100  # Increased to allow many retries for tool calls
+
+# API Settings
+# nim_api_key is loaded from .env file using NVIDIA_NIM_API_KEY
+nim_api_base_url: "https://health.api.nvidia.com/v1"
+api_timeout: 600
+polling_interval: 10
+
+# Protein Design Settings
+output_dir: "environments/hack0/protein_design_env/outputs"
+
+# WandB tracking settings
+use_wandb: true
+wandb_name: "binderbench"
+wandb_project: "atropos"  # Will default to this if not specified
+include_messages: true    # Include messages in WandB logs
+
+# Dataset configuration
+dataset_name: "ronig/protein_binding_sequences"
+target_col: "receptor"
+binder_col: "peptide"
+
+# Scoring weights for final complex quality
+metric_weights:
+  plddt: 0.3
+  ptm: 0.3
+  iptm: 0.4
--- a/environments/hack0/protein_design_env/protein_env.py
+++ b/environments/hack0/protein_design_env/protein_env.py