Commit graph

25 commits

Author SHA1 Message Date
Oliver Stanley
f490b9f760
Tolerant scoring for CodeI/O based on edit distances (#277)
* add zss dep

* codeio edit distance-based scoring

* edit distance tweaks
2025-03-07 22:49:35 +01:00
Oliver Stanley
d1e505a8e9
First version of CodeI/O reasoning data (#264)
* notebook for prepping first set of raw code files
* updated codeio processing notebook for repo-level processing
* fix for edge case in codeio scoring
* Add reformat notebook
* filtering pass
* add non-determinism filtering
* Tweak CodeIODataset & include first real data
* add basic codeio test, metadata
2025-03-05 22:34:11 +01:00
Andreas Köpf
5d7fbac0ad
Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Oliver
5fa06c961f Fix 2025-02-26 11:17:23 +00:00
Oliver
81c77a495d Add note on code execution to CodeIODataset 2025-02-25 22:39:06 +00:00
Oliver
0252dd905f Move data file & load into memory on first object creation 2025-02-25 22:36:38 +00:00
Oliver
fe502d5eb2 Register CodeIODataset 2025-02-24 18:28:35 +00:00
Oliver
43daec67ea Initial scoring algo for codeio 2025-02-24 18:27:53 +00:00
Oliver
1795c8ea7a Add tiny sample dataset & efficient sampling 2025-02-24 17:58:31 +00:00
Oliver
7b5a12a92c Remove outdated comment 2025-02-23 22:24:13 +00:00
Oliver
e07287e1f9 Add validation 2025-02-23 22:23:45 +00:00
Oliver
f787069fd2 Add input prediction 2025-02-23 20:27:27 +00:00
Oliver
e718168428 Draft CodeIO-derived reasoning problems dataset 2025-02-22 00:56:52 +00:00
Oliver
563480329e Outline CodeIO dataset classes 2025-02-22 00:21:17 +00:00
Andreas Koepf
3e7ff3b084 use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
Oliver
eb708e88b3 Constrain reward 2025-02-17 19:20:45 +00:00
Oliver
1d0cad46f2 Formatting/scoring improvements for BF & family 2025-02-17 19:08:15 +00:00
Andreas Köpf
3f6b2fc807
Add Coaching & ScoreBoard class (result tracking) (#72)
* feat: Add Coach and ScoreBoard classes for performance tracking and difficulty adjustment
* feat: Add GroupedScores class to wrap aggregated scores
* refactor: Create ScoreStats class with tuple-based score statistics
* feat: Add unit test for Coach with CompositeDataset and multiple datasets
* fix: Add difficulty metadata to leg counting dataset
* feat: Add clear() method to ScoreBoard to reset all stored data
* feat: Add __len__ method to ScoreBoard to return number of scores
* feat: Add update_dataset_config method to CompositeDataset
* cleanup __init__ & imports
2025-02-06 23:15:28 +01:00
Andreas Koepf
ebb88e6c6a lint 2025-01-30 22:55:04 +01:00
Rich Jones
2f9224127d docstrings 2025-01-30 17:20:53 +01:00
Rich Jones
2d9b916f8b rm bad copypaste 2025-01-30 17:16:37 +01:00
Rich Jones
9d4f896329 init definitions 2025-01-30 17:15:48 +01:00
Rich Jones
2393ae0525 difficulty levels 2025-01-30 16:24:28 +01:00
Rich Jones
574df8de23 add contrib 2025-01-30 15:42:11 +01:00
Rich Jones
99bf648989 initial bf working, contrib not committed 2025-01-30 15:38:03 +01:00