atropos/atroposlib/envs/server_handling
Jai Suphavadeeprasit f1c20591b6 prompt logprobs
2026-03-03 21:58:05 -05:00
..
MANAGED_SERVER.md prompt logprobs 2026-03-03 21:58:05 -05:00
managed_server.py prompt logprobs 2026-03-03 21:58:05 -05:00
openai_server.py [pre-commit.ci] auto fixes from pre-commit.com hooks 2026-02-27 18:16:06 +00:00
README.md prompt logprobs 2026-03-03 21:58:05 -05:00
server_baseline.py prompt logprobs 2026-03-03 21:58:05 -05:00
server_harness.py fix tests 2025-10-29 10:55:10 -05:00
server_manager.py prompt logprobs 2026-03-03 21:58:05 -05:00
sglang_server.py add tokenizer name config to set the vllm/sglang tokenizer to something different if needed 2026-02-09 15:26:29 -06:00
trl_vllm_server.py Add reasoning configuration support across server implementations 2026-01-05 23:20:01 +00:00
vllm_server.py prompt logprobs 2026-03-03 21:58:05 -05:00

Server Handling

This module provides server abstraction layers for different LLM inference backends.

ManagedServer

For automatic token and logprob tracking, see the ManagedServer Guide.

Note: OpenAI endpoints do not support token IDs/logprobs required for ManagedServer. Set ATROPOS_ALLOW_DUMMY_MANAGED_SERVER=1 to use a placeholder implementation for testing/evaluation. See OpenAI Endpoint Limitations for details.

Normalized get_logprobs API

ManagedServer and server backends now expose a normalized get_logprobs(...) interface so callers can consume a single schema across backends:

  • prompt_tokens
  • prompt_topk_token_ids
  • prompt_topk_logprobs

For backends that only expose sampled-token logprobs, prompt top-k arrays are synthesized with k=1 for interface compatibility.

Reasoning Model Support

The ReasoningConfig class enables support for reasoning/thinking models across different providers.

Provider Differences

Feature OpenAI OpenRouter / Others
Format {"reasoning_effort": "high"} {"reasoning": {"enabled": true, "effort": "high"}}
Effort Levels none, minimal, low, medium, high, xhigh none, minimal, low, medium, high, xhigh
Max Tokens Not supported {"reasoning": {"max_tokens": 16000}}
Temperature Must be 1.0 No restriction
Token Param max_completion_tokens max_tokens

Effort Level to Token Mapping

When providers don't support effort strings, effort levels map to approximate token budgets (based on 32k base):

Effort Tokens Percentage
none 1,024 Minimum
minimal 3,200 ~10%
low 6,400 ~20%
medium 16,000 ~50%
high 25,600 ~80%
xhigh 30,400 ~95%

Provider Token Limits

  • OpenRouter: Caps Anthropic reasoning at 1,024-32,000 tokens (docs)
  • Native Anthropic: Supports up to 128k extended thinking tokens

Usage

Reasoning is only injected for chat completions (not completions or logprobs API).

# Via environment config
config = BaseEnvConfig(
    thinking_mode=True,
    reasoning_effort="high",
    max_reasoning_tokens=16000,
)

# Direct ReasoningConfig
reasoning_config = ReasoningConfig(
    enabled=True,
    effort="high",
    max_tokens=16000,
)

Bypassing Reasoning Injection

Pass skip_reasoning=True to any chat completion call:

await server.chat_completion(messages=messages, skip_reasoning=True)

Important Constraints

  1. OpenRouter: Only accepts ONE of effort or max_tokens, not both. When both specified, effort takes priority.
  2. OpenAI: All effort levels are passed through directly.
  3. Auto-enable: Setting effort or max_tokens automatically enables reasoning mode.