Commit graph

9 commits

Author SHA1 Message Date
pre-commit-ci[bot]
2f9132ae63 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-06-12 15:20:13 +00:00
teknium1
54268a76ce add additional data dumping features 2025-06-10 01:59:25 -07:00
teknium1
2ddc7f39cd few small defaults changes 2025-05-26 12:43:57 -07:00
teknium1
b2bb61f8cd make eval stuff clearer 2025-05-26 01:56:45 -07:00
teknium1
e2ea82b29b Fix up dataset and data dumps 2025-05-26 01:50:22 -07:00
teknium1
ae0340bb9f prevent token explosion issue by reducing max_token to 15k instead of 16k 2025-05-23 18:09:36 -07:00
teknium1
1fa798a69e Making saving data optional in config, add scores to saved data 2025-05-23 14:11:11 -07:00
teknium1
a20886d720 fix many many things jules didnt do right 2025-05-23 12:50:38 -07:00
google-labs-jules[bot]
276a845dd7 feat: Implement SWE-RL Environment with Full Refinements
I've implemented the SWERLEnv in environments/swe_rl_env.py, based on the
SWE-RL paper (arXiv:2502.18449). This version incorporates extensive
refinements based on your feedback.

Key features implemented in environments/swe_rl_env.py:
- Core environment structure (setup, trajectory collection, scoring, evaluation).
- "Thinking" step: LLM is prompted for reasoning within <think> </think> tags
  before generating a patch. Includes strict parsing for these tags.
- Dynamic prompt construction using `tokenizer.apply_chat_template` with
  NousResearch/DeepHermes-3-Llama-3-8B-Preview as the default model.
- Hugging Face dataset integration: Loads data from HF Hub with configurable
  dataset name, splits, and column mappings.
- Reward mechanism: Based on thinking tag correctness, patch format
  (SEARCH/REPLACE), and similarity to the oracle patch.
- Comprehensive WandB logging for training/evaluation metrics.

NOTE: I made multiple attempts to update 'environments/README.md'
with documentation for this new environment. While I
reported success in some turns, this was not consistently verifiable
and may not have been correctly applied. The README.md file may
require manual verification and updating for the SWERLEnv.
2025-05-22 01:28:00 +00:00