dmahan93
58446dbcb1
Merge pull request #204 from NousResearch/multienv-enforce-mins
...
Multienv with enforced minimum samples in a batch
2025-07-07 08:53:43 -05:00
Dakota
08e14cc745
feat: add minimum batch allocation support for environments
...
- Add min_batch_allocation parameter to ensure environments contribute minimum proportion to each batch
- Implement grab_batch_with_minimum_allocations function with proper scaling when allocations exceed 100%
- Add mixed-size group buffering to handle variable-sized data submissions
- Update server to use minimum allocation logic when any env has min_batch_allocation set
- Add comprehensive tests for minimum allocation scenarios
- Update documentation in API README and CONFIG.md
- Update example environments to demonstrate the feature
This feature allows critical environments to guarantee they contribute at least a specified proportion (0.0-1.0) to each training batch, ensuring important data sources are always represented during training.
🤖 Generated with [Claude Code](https://claude.ai/code )
Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-07 08:50:28 -05:00
pre-commit-ci[bot]
ee5257522a
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-07-04 14:34:37 +00:00
Alexey Gorbatovski
14c70c0e68
Include run name in wandb initialization in BaseEnv
2025-07-04 17:13:34 +03:00
Dakota
683559afd2
allow inf (<= 0 max_token_len) generations if trainer requests it, but raise a warning so that users can check their logs and get info if their trainers are doing something weird
2025-07-01 09:52:10 -05:00
crStiv
e9a547ce32
Update base.py
2025-06-19 22:52:26 +02:00
Dakota
f3bbc6a42d
Fix import ordering with isort
...
- Move typing_extensions import to proper location
- Satisfy pre-commit isort requirements
2025-06-04 10:40:41 -05:00
Dakota
0ff55bf2cf
Fix TypedDict import for Python 3.10 compatibility
...
- Use typing_extensions.TypedDict instead of typing.TypedDict
- Fixes Pydantic error on Python < 3.12
2025-06-04 10:37:51 -05:00
hjc-puro
b5e7746c99
remove process defaults, respect config init
2025-06-02 21:19:45 -04:00
dmahan93
4a21ed0891
Enhance ScoredData model and API documentation
...
- Added optional fields: advantages, messages, and images to the ScoredData model.
- Updated API responses to include these new fields when no data is available.
- Revised README.md to reflect changes in the API structure and response format.
2025-06-02 17:28:25 -05:00
Shannon Sands
2eddcb3cd9
fu linting
2025-05-23 11:18:16 +10:00
Shannon Sands
d98f65f444
linting
2025-05-23 11:09:06 +10:00
Shannon Sands
606a2615f0
loop check
2025-05-23 11:05:08 +10:00
Shannon Sands
00dd120067
Merge branch 'main' into blackjack2-env
2025-05-14 17:27:44 -07:00
dmahan93
e2128b817e
restructure config_init...
2025-05-13 10:00:45 -05:00
dmahan93
df62979b90
refactor to not mess up process...
2025-05-13 09:22:07 -05:00
dmahan93
96be544228
Merge commit ' 71e7a5ca27' into add-support-for-custom-api-servers
2025-05-12 18:40:35 -05:00
Shannon Sands
36f6822d71
Merge branch 'main' into blackjack2-env
2025-05-13 07:54:04 +10:00
dmahan93
706097db21
Merge pull request #36 from NousResearch/add-gym-frozen-lake-example
...
add gym taxi env
2025-05-12 08:49:11 -05:00
Shannon Sands
101cbdd803
Merge branch 'main' into blackjack2-env
2025-05-12 07:22:24 +10:00
hjc-puro
e68df555ba
use parse_http_rseponse
2025-05-10 05:12:08 -04:00
hjc-puro
a659217afe
Merge branch 'main' into 2025-05-03-http-error-logging
2025-05-10 17:09:22 +08:00
dmahan93
92428fec8f
add gym taxi env
2025-05-09 19:05:01 -05:00
Shannon Sands
4d0f919fd1
linting
2025-05-10 09:10:31 +10:00
Shannon Sands
6c6a1c5d06
update handle_send_to_api
2025-05-10 09:07:54 +10:00
dmahan93
40b12dae60
run pre-commit on all files
2025-05-09 09:54:20 -05:00
hjc-puro
629d8c1731
Merge pull request #14 from NousResearch/2025-05-02-server-cli
2025-05-09 13:37:54 +08:00
dmahan93
70cf61c210
add custom server support
2025-05-08 12:01:49 -05:00
dmahan93
61af36b226
Update base.py
2025-05-08 11:53:15 -05:00
dmahan93
1848c7d453
Update base.py
2025-05-08 11:29:29 -05:00
hjc-puro
9415cadc53
fix cls name
2025-05-08 06:54:43 -07:00
hjc-puro
cdf5a9baa9
remove ,
2025-05-07 15:22:01 -04:00
hjc-puro
0373005175
forgot to condition on is ServerBaseline instance
2025-05-07 15:09:34 -04:00
hjc-puro
ec6b86bb5d
unbreak ServerBaseline
2025-05-07 14:51:51 -04:00
edmund
2cb1ff0087
Removed mentions of NousResearch/DeepHermes-3-Llama-3-1B-Preview and swapped it for NousResearch/DeepHermes-3-Llama-3-3B-Preview
...
I don't think there is a NousResearch/DeepHermes-3-Llama-3-1B-Preview
2025-05-07 18:03:17 +01:00
hjc-puro
38575d7029
not supported warning for server baseline
2025-05-06 22:29:34 -04:00
hjc-puro
1d35b9d626
remove comment
2025-05-03 16:26:35 -07:00
hjc-puro
ae24b022c3
fix bug where None would be parsed as a str instead of special value
2025-05-03 16:24:35 -07:00
hjc-puro
a4d8d7e875
remove spurious comments
2025-05-03 15:58:17 -07:00
hjc-puro
aa23f10857
remove try/except because handled in separate pr
2025-05-03 15:52:13 -07:00
hjc-puro
4348dd2ec1
hide complicated openai config override behavior somewhere else
2025-05-03 14:18:50 -07:00
hjc-puro
e06469f8c2
replace await resp.json() with await parse_http_response(resp)
2025-05-03 06:36:05 -04:00
hjc-puro
fe616ec7fa
add exceptions
2025-05-03 05:28:40 -04:00
hjc-puro
af26b2e68a
propagate cli stuff to serve command
2025-05-02 15:29:29 -04:00
hjc-puro
7c6c5edf30
add back env_config_cls
2025-05-02 09:00:57 -07:00
hjc-puro
6661e286c4
remove use_api in env_manager, log config to wandb
2025-05-02 08:52:28 -07:00
hjc-puro
e40dce445c
remove oai key in defaults for process
2025-05-02 05:57:34 -07:00
hjc-puro
60d67d91e7
--slurm and --testing in outer namespace
2025-05-02 03:46:34 -07:00
hjc-puro
9a8ae1630b
import refactor
2025-05-02 01:00:04 -07:00
hjc-puro
78cfef9daf
add process subcommand
2025-05-02 03:42:10 -04:00