mirror of https://github.com/NousResearch/atropos.git synced 2026-04-19 12:57:58 +00:00

History

Jai Suphavadeeprasit dbf6026165 remove reqs and update community readme		2026-03-02 11:18:52 -05:00
..
__init__.py	linting & moved to community	2025-05-27 12:52:37 +10:00
grpo.py	more linting	2025-05-27 13:09:07 +10:00
GRPO_README.md	remove reqs and update community readme	2026-03-02 11:18:52 -05:00
main.py	linting & moved to community	2025-05-27 12:52:37 +10:00
MCP_datasets.py	linting & moved to community	2025-05-27 12:52:37 +10:00
pyproject.toml	linting & moved to community	2025-05-27 12:52:37 +10:00
README.md	Update README.md	2025-05-28 14:33:48 +02:00
requirements.txt	linting & moved to community	2025-05-27 12:52:37 +10:00
tool_calling_server.py	more linting	2025-05-27 13:09:07 +10:00

README.md

Readme

1-Minute Demo Video

Watch the demo on YouTube https://www.loom.com/share/44c793c47e7d45eaaf02bac7c168a10d?sid=4ff3d95f-701f-4d11-be3f-aa89f8fa2f0d

Environment Design & Motivation

NousWhiteHouse is a reinforcement learning (RL) project focused on improving agent tool calls using the Model Context Protocol (MCP). The goal is to enable agents to dynamically discover and invoke tools more effectively, leveraging MCP for context-aware decision-making.

After replicating RESTGPT, we noticed that LLMs struggled to find the right tools to call, such as finding Gims songs on Spotify. Instead of manually matching multiple APIs, the recent advent of MCP inspires us to double down on tool-calling efforts.

Our Dataset uses a format like- { "user_prompt_text": "What is the current stock price of AAPL?", "expected_mcp_call": { "tool_name": "getStockPrice", "arguments": { "tickerSymbol": "AAPL" } } }

the return prompts are compared with the expected_mcp_call

Our main task or challenge that our environment presented- Help LLMs use MCPs

Why is this environment interesting or useful for RL research- this environment will result in super fast tool calling with more accurate results and allow for more seamless integrations of tools with LLMs

Framework- we used the Single Tool Environment as a framework for the MCP env

Challenge- Finding existing large datasets with MCP calls was extremely difficult.

Estimate

🧪 Zero-Training Test Results

Results of running the example trainer on the gsm8k server via Lambda:

W&B Link: https://api.wandb.ai/links/l-a-t-hacken-tu-eindhoven/nqjy1v4b