mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Add README for server handling module and refine ReasoningConfig logic
- Introduced a new README.md file detailing the server handling module, including support for reasoning models, provider differences, effort level mappings, and usage examples. - Cleaned up the ReasoningConfig class by removing unnecessary comments and clarifying logic related to reasoning injection and provider-specific requirements.
This commit is contained in:
parent
0e187d7869
commit
b2d17a44d2
2 changed files with 80 additions and 32 deletions
70
atroposlib/envs/server_handling/README.md
Normal file
70
atroposlib/envs/server_handling/README.md
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
# Server Handling
|
||||
|
||||
This module provides server abstraction layers for different LLM inference backends.
|
||||
|
||||
## Reasoning Model Support
|
||||
|
||||
The `ReasoningConfig` class enables support for reasoning/thinking models across different providers.
|
||||
|
||||
### Provider Differences
|
||||
|
||||
| Feature | OpenAI | OpenRouter / Others |
|
||||
|---------|--------|---------------------|
|
||||
| Format | `{"reasoning_effort": "high"}` | `{"reasoning": {"enabled": true, "effort": "high"}}` |
|
||||
| Effort Levels | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` | `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
|
||||
| Max Tokens | Not supported | `{"reasoning": {"max_tokens": 16000}}` |
|
||||
| Temperature | Must be `1.0` | No restriction |
|
||||
| Token Param | `max_completion_tokens` | `max_tokens` |
|
||||
|
||||
### Effort Level to Token Mapping
|
||||
|
||||
When providers don't support effort strings, effort levels map to approximate token budgets (based on 32k base):
|
||||
|
||||
| Effort | Tokens | Percentage |
|
||||
|--------|--------|------------|
|
||||
| none | 1,024 | Minimum |
|
||||
| minimal | 3,200 | ~10% |
|
||||
| low | 6,400 | ~20% |
|
||||
| medium | 16,000 | ~50% |
|
||||
| high | 25,600 | ~80% |
|
||||
| xhigh | 30,400 | ~95% |
|
||||
|
||||
### Provider Token Limits
|
||||
|
||||
- **OpenRouter**: Caps Anthropic reasoning at 1,024-32,000 tokens ([docs](https://openrouter.ai/docs/guides/best-practices/reasoning-tokens))
|
||||
- **Native Anthropic**: Supports up to 128k extended thinking tokens
|
||||
|
||||
### Usage
|
||||
|
||||
Reasoning is only injected for **chat completions** (not completions or logprobs API).
|
||||
|
||||
```python
|
||||
# Via environment config
|
||||
config = BaseEnvConfig(
|
||||
thinking_mode=True,
|
||||
reasoning_effort="high",
|
||||
max_reasoning_tokens=16000,
|
||||
)
|
||||
|
||||
# Direct ReasoningConfig
|
||||
reasoning_config = ReasoningConfig(
|
||||
enabled=True,
|
||||
effort="high",
|
||||
max_tokens=16000,
|
||||
)
|
||||
```
|
||||
|
||||
### Bypassing Reasoning Injection
|
||||
|
||||
Pass `skip_reasoning=True` to any chat completion call:
|
||||
|
||||
```python
|
||||
await server.chat_completion(messages=messages, skip_reasoning=True)
|
||||
```
|
||||
|
||||
### Important Constraints
|
||||
|
||||
1. **OpenRouter**: Only accepts ONE of `effort` or `max_tokens`, not both. When both specified, effort takes priority.
|
||||
2. **OpenAI**: All effort levels are passed through directly.
|
||||
3. **Auto-enable**: Setting `effort` or `max_tokens` automatically enables reasoning mode.
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue