reasoning-gym/reasoning_gym/data
Roman Machacek 2c52f33c3a
CodeIO HQ Dataset (#382)
* ADD: CodeIO high quality dataset

Based on the dataset for CodeI/O. Annotated using Qwen-Coder and filtered based on the various metrics resulting in high quality filtered dataset, where approx 50% of the original data is kept.

* ADD: Compressed version

* Delete pure json version
2025-04-01 22:34:33 +02:00
..
__init__.py Add Coaching & ScoreBoard class (result tracking) (#72) 2025-02-06 23:15:28 +01:00
acre_objects.json Add ACRE(Abstract Causal REasoning Beyond Covariation) python generators (#199) 2025-03-10 00:09:54 +01:00
anagrams.jsonl generate all english anagrams 2025-02-05 16:25:23 +01:00
codeio-hq.jsonl.gz CodeIO HQ Dataset (#382) 2025-04-01 22:34:33 +02:00
codeio.jsonl.gz Add a few new CodeI/O samples, resolve numeric answer scoring bug (#332) 2025-03-11 23:55:33 +01:00
in_the_year_2889.txt formatting 2025-01-24 10:34:07 +01:00
rush_18k.txt add sampled subset of rush hour database 2025-02-14 11:10:30 +01:00
wordle_words.py rename static.py -> wordle_words.py 2025-01-30 01:06:52 +01:00
words.csv lint 2025-02-03 11:35:30 +00:00