Commit graph

24 commits

Author SHA1 Message Date
Oliver Stanley
3286a68361 First version of CodeI/O reasoning data (#264)
* notebook for prepping first set of raw code files
* updated codeio processing notebook for repo-level processing
* fix for edge case in codeio scoring
* Add reformat notebook
* filtering pass
* add non-determinism filtering
* Tweak CodeIODataset & include first real data
* add basic codeio test, metadata
2025-03-05 22:34:11 +01:00
Andreas Köpf
b2904ccab9 Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Oliver
8f05e6108c Fix 2025-02-26 11:17:23 +00:00
Oliver
4bdb8c7d6b Add note on code execution to CodeIODataset 2025-02-25 22:39:06 +00:00
Oliver
ef2f8d1978 Move data file & load into memory on first object creation 2025-02-25 22:36:38 +00:00
Oliver
f895a458c7 Register CodeIODataset 2025-02-24 18:28:35 +00:00
Oliver
efbcfb6eed Initial scoring algo for codeio 2025-02-24 18:27:53 +00:00
Oliver
5a222a398b Add tiny sample dataset & efficient sampling 2025-02-24 17:58:31 +00:00
Oliver
7ff162e9bb Remove outdated comment 2025-02-23 22:24:13 +00:00
Oliver
c0923a6fb8 Add validation 2025-02-23 22:23:45 +00:00
Oliver
40d7dfdb5f Add input prediction 2025-02-23 20:27:27 +00:00
Oliver
489dea7267 Draft CodeIO-derived reasoning problems dataset 2025-02-22 00:56:52 +00:00
Oliver
378cba2de1 Outline CodeIO dataset classes 2025-02-22 00:21:17 +00:00
Andreas Koepf
ff5b210106 use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
Oliver
49081e44bc Constrain reward 2025-02-17 19:20:45 +00:00
Oliver
0de0044d52 Formatting/scoring improvements for BF & family 2025-02-17 19:08:15 +00:00
Andreas Köpf
a607db79f7 Add Coaching & ScoreBoard class (result tracking) (#72)
* feat: Add Coach and ScoreBoard classes for performance tracking and difficulty adjustment
* feat: Add GroupedScores class to wrap aggregated scores
* refactor: Create ScoreStats class with tuple-based score statistics
* feat: Add unit test for Coach with CompositeDataset and multiple datasets
* fix: Add difficulty metadata to leg counting dataset
* feat: Add clear() method to ScoreBoard to reset all stored data
* feat: Add __len__ method to ScoreBoard to return number of scores
* feat: Add update_dataset_config method to CompositeDataset
* cleanup __init__ & imports
2025-02-06 23:15:28 +01:00
Andreas Koepf
25540b6634 lint 2025-01-30 22:55:04 +01:00
Rich Jones
11c2819caa docstrings 2025-01-30 17:20:53 +01:00
Rich Jones
993d09eb40 rm bad copypaste 2025-01-30 17:16:37 +01:00
Rich Jones
645aa13a15 init definitions 2025-01-30 17:15:48 +01:00
Rich Jones
3b11e4c296 difficulty levels 2025-01-30 16:24:28 +01:00
Rich Jones
0688cadf59 add contrib 2025-01-30 15:42:11 +01:00
Rich Jones
bf053e2266 initial bf working, contrib not committed 2025-01-30 15:38:03 +01:00