CodeIO HQ Dataset (#382)

* ADD: CodeIO high quality dataset

Based on the dataset for CodeI/O. Annotated using Qwen-Coder and filtered based on the various metrics resulting in high quality filtered dataset, where approx 50% of the original data is kept.

* ADD: Compressed version

* Delete pure json version
This commit is contained in:
Roman Machacek 2025-04-01 22:34:33 +02:00 committed by GitHub
parent 415bcb5ace
commit 2c52f33c3a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

Binary file not shown.