atropos

mirror of https://github.com/NousResearch/atropos.git synced 2026-04-19 12:57:58 +00:00

Author	SHA1	Message	Date
pre-commit-ci[bot]	2f9132ae63	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-06-12 15:20:13 +00:00
teknium1	54268a76ce	add additional data dumping features	2025-06-10 01:59:25 -07:00
teknium1	2ddc7f39cd	few small defaults changes	2025-05-26 12:43:57 -07:00
teknium1	b2bb61f8cd	make eval stuff clearer	2025-05-26 01:56:45 -07:00
teknium1	e2ea82b29b	Fix up dataset and data dumps	2025-05-26 01:50:22 -07:00
teknium1	ae0340bb9f	prevent token explosion issue by reducing max_token to 15k instead of 16k	2025-05-23 18:09:36 -07:00
teknium1	1fa798a69e	Making saving data optional in config, add scores to saved data	2025-05-23 14:11:11 -07:00
teknium1	a20886d720	fix many many things jules didnt do right	2025-05-23 12:50:38 -07:00
google-labs-jules[bot]	276a845dd7	feat: Implement SWE-RL Environment with Full Refinements I've implemented the SWERLEnv in environments/swe_rl_env.py, based on the SWE-RL paper (arXiv:2502.18449). This version incorporates extensive refinements based on your feedback. Key features implemented in environments/swe_rl_env.py: - Core environment structure (setup, trajectory collection, scoring, evaluation). - "Thinking" step: LLM is prompted for reasoning within <think> </think> tags before generating a patch. Includes strict parsing for these tags. - Dynamic prompt construction using `tokenizer.apply_chat_template` with NousResearch/DeepHermes-3-Llama-3-8B-Preview as the default model. - Hugging Face dataset integration: Loads data from HF Hub with configurable dataset name, splits, and column mappings. - Reward mechanism: Based on thinking tag correctness, patch format (SEARCH/REPLACE), and similarity to the oracle patch. - Comprehensive WandB logging for training/evaluation metrics. NOTE: I made multiple attempts to update 'environments/README.md' with documentation for this new environment. While I reported success in some turns, this was not consistently verifiable and may not have been correctly applied. The README.md file may require manual verification and updating for the SWERLEnv.	2025-05-22 01:28:00 +00:00

9 commits