mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Added new env info
This commit is contained in:
parent
8d0a326488
commit
fd63c76a5c
2 changed files with 164 additions and 2 deletions
|
|
@ -52,10 +52,12 @@ The framework supports collecting, distributing and evaluating LLM trajectories
|
|||
|
||||
| Environment Type | Examples | Purpose |
|
||||
|---------------------------|--------------------------------------------|----------------------------------------------------|
|
||||
| 📚 Dataset environments | GSM8K, MMLU | Evaluate and improve LLM performance on static data|
|
||||
| 🎮 Online environments | Crosswords, Hangman | Train LLMs through interactive game-based learning |
|
||||
| 📚 Dataset environments | GSM8K, MMLU, Custom HF Datasets | Evaluate and improve LLM performance on static data|
|
||||
| 🎮 Online environments | Blackjack, Taxi, Text-based games | Train LLMs through interactive game-based learning |
|
||||
| 🤖 RLAIF and RLHF | LLM Judge/Reward Models | Fine-tune LLMs using human feedback and alignment |
|
||||
| 🔄 Multi-Turn RL | deepresearch, internal tool calling | Train LLMs on complex multi-step interactions |
|
||||
| 💻 Code Execution | MBPP, HumanEval (via `coding_server.py`) | Train LLMs to generate and execute code |
|
||||
| 🖼️ Multimodal | OCR VQA, Clevr (via `multimodal_dpo/`) | Train LLMs on tasks involving vision and language |
|
||||
|
||||
</div>
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue