Add README for server handling module and refine ReasoningConfig logic

- Introduced a new README.md file detailing the server handling module, including support for reasoning models, provider differences, effort level mappings, and usage examples. - Cleaned up the ReasoningConfig class by removing unnecessary comments and clarifying logic related to reasoning injection and provider-specific requirements.
2026-04-19 12:57:58 +00:00 · 2026-01-15 07:21:53 +00:00 · 2026-01-15 07:21:53 +00:00 · b2d17a44d2
commit b2d17a44d2
parent 0e187d7869
2 changed files with 80 additions and 32 deletions
--- a/atroposlib/envs/server_handling/README.md
+++ b/atroposlib/envs/server_handling/README.md
@ -0,0 +1,70 @@
+# Server Handling
+
+This module provides server abstraction layers for different LLM inference backends.
+
+## Reasoning Model Support
+
+The `ReasoningConfig` class enables support for reasoning/thinking models across different providers.
+
+### Provider Differences
+
+| Feature | OpenAI | OpenRouter / Others |
+|---------|--------|---------------------|
+| Format | `{"reasoning_effort": "high"}` | `{"reasoning": {"enabled": true, "effort": "high"}}` |
+| Effort Levels | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
+| Max Tokens | Not supported | `{"reasoning": {"max_tokens": 16000}}` |
+| Temperature | Must be `1.0` | No restriction |
+| Token Param | `max_completion_tokens` | `max_tokens` |
+
+### Effort Level to Token Mapping
+
+When providers don't support effort strings, effort levels map to approximate token budgets (based on 32k base):
+
+| Effort | Tokens | Percentage |
+|--------|--------|------------|
+| none | 1,024 | Minimum |
+| minimal | 3,200 | ~10% |
+| low | 6,400 | ~20% |
+| medium | 16,000 | ~50% |
+| high | 25,600 | ~80% |
+| xhigh | 30,400 | ~95% |
+
+### Provider Token Limits
+
+- **OpenRouter**: Caps Anthropic reasoning at 1,024-32,000 tokens ([docs](https://openrouter.ai/docs/guides/best-practices/reasoning-tokens))
+- **Native Anthropic**: Supports up to 128k extended thinking tokens
+
+### Usage
+
+Reasoning is only injected for **chat completions** (not completions or logprobs API).
+
+```python
+# Via environment config
+config = BaseEnvConfig(
+    thinking_mode=True,
+    reasoning_effort="high",
+    max_reasoning_tokens=16000,
+)
+
+# Direct ReasoningConfig
+reasoning_config = ReasoningConfig(
+    enabled=True,
+    effort="high",
+    max_tokens=16000,
+)
+```
+
+### Bypassing Reasoning Injection
+
+Pass `skip_reasoning=True` to any chat completion call:
+
+```python
+await server.chat_completion(messages=messages, skip_reasoning=True)
+```
+
+### Important Constraints
+
+1. **OpenRouter**: Only accepts ONE of `effort` or `max_tokens`, not both. When both specified, effort takes priority.
+2. **OpenAI**: All effort levels are passed through directly.
+3. **Auto-enable**: Setting `effort` or `max_tokens` automatically enables reasoning mode.
+