readme updates for tool calling

2026-04-19 12:57:58 +00:00 · 2026-03-03 12:22:10 -06:00 · 2026-03-03 12:22:10 -06:00 · c8eb63f33d
commit c8eb63f33d
parent 8f21bb57ed
1 changed files with 85 additions and 0 deletions
--- a/atroposlib/envs/server_handling/README.md
+++ b/atroposlib/envs/server_handling/README.md
@ -8,6 +8,91 @@ For automatic token and logprob tracking, see the [ManagedServer Guide](MANAGED_

 > **Note:** OpenAI endpoints do not support token IDs/logprobs required for ManagedServer. Set `ATROPOS_ALLOW_DUMMY_MANAGED_SERVER=1` to use a placeholder implementation for testing/evaluation. See [OpenAI Endpoint Limitations](MANAGED_SERVER.md#openai-endpoint-limitations) for details.

+## Tool Call Support
+
+ManagedServer supports OpenAI-style tool calling via vLLM's tool parsers. Pass `tool_parser` at init:
+
+```python
+server_manager = ServerManager(
+    configs=[APIServerConfig(...)],
+    tool_parser="hermes",  # or llama3_json, mistral, deepseek_v3, qwen3_coder, etc.
+)
+
+async with server_manager.managed_server(tokenizer=tokenizer) as managed:
+    result = await managed.chat_completion(
+        messages=[{"role": "user", "content": "What's the weather?"}],
+        tools=[{
+            "type": "function",
+            "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}
+        }],
+        tool_choice="auto",  # "auto", "none", "required"
+    )
+
+    # Structured tool_calls in response
+    if result.choices[0].message.tool_calls:
+        print(result.choices[0].message.tool_calls)
+
+    # Nodes still have raw text with <tool_call> tags for training
+    nodes = managed.get_state()["nodes"]
+```
+
+Requires `vllm` installed. Without it, tool parsing is disabled with a warning — everything else still works.
+
+## OpenAI Proxy
+
+Exposes ManagedServer as an OpenAI-compatible HTTP API for external tools (CLIs, GUIs, microservices).
+
+### Standalone
+
+```bash
+python -m atroposlib.envs.server_handling.managed_server_proxy \
+    --config servers.json --port 9100
+```
+
+`servers.json`:
+```json
+{
+    "model_name": "Qwen/Qwen3-4B",
+    "servers": [
+        {"base_url": "http://gpu1:8000/v1", "server_type": "vllm"},
+        {"base_url": "http://gpu2:8000/v1", "server_type": "vllm"}
+    ]
+}
+```
+
+### Endpoints
+
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/sessions/create` | Create session. Optional `base_url` to pin to a server, `tool_parser` name. |
+| POST | `/{uuid}/v1/chat/completions` | OpenAI chat completions (with tools support). |
+| POST | `/{uuid}/v1/chat/completions/render` | Preview rendered prompt without generating. |
+| GET | `/{uuid}/nodes` | Get tracked tokens/logprobs/masks for training. |
+| DELETE | `/{uuid}` | Cleanup session. |
+| GET | `/sessions` | List active sessions. |
+| GET | `/servers` | List backend servers. |
+| POST | `/setup` | Push server config (used by ServerManager). |
+| GET | `/v1/models` | List models. |
+| GET | `/health` | Health check. |
+
+### Via ServerManager
+
+```python
+server_manager = ServerManager(
+    configs=[APIServerConfig(...)],
+    proxy_url="http://localhost:9100",  # auto-enables proxy mode
+    tool_parser="hermes",
+)
+
+# managed_server() now routes through the proxy
+async with server_manager.managed_server(tokenizer=tokenizer) as managed:
+    result = await managed.chat_completion(messages=[...], tools=[...])
+    url = managed.get_url()  # "http://localhost:9100/{uuid}/v1" — hand to external apps
+    nodes = await managed.fetch_state()  # get tokens/logprobs
+```
+
+Or set `ATROPOS_PROXY_URL=http://localhost:9100` env var instead of passing `proxy_url`.
+
 ## Reasoning Model Support

 The `ReasoningConfig` class enables support for reasoning/thinking models across different providers.