feat: add scoring cascade for reducing false negatives (#526)

* feat: add scoring cascade for reducing false negatives in answer verification

* style: fix black and isort formatting

Run black and isort to satisfy pre-commit checks.

Made-with: Cursor

* docs: add scoring cascade example to Quickstart section

Mention the experimental scoring cascade feature at the end of the
Quickstart section with a disclaimer and complete usage examples
showing both the dataset method and standalone function.

Made-with: Cursor

* docs: shorten scoring cascade section in README

Trim to a concise standalone example per review feedback.

Made-with: Cursor

* docs: simplify scoring cascade description in README

Made-with: Cursor

* update readme

---------

Co-authored-by: Zafir Stojanovski <zaf.stojano@gmail.com>
This commit is contained in:
Ritvik Rastogi 2026-04-18 01:09:15 +05:30 committed by GitHub
parent 437e0b49c4
commit 49b07130b3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 477 additions and 0 deletions

View file

@ -85,6 +85,14 @@ reasoning_gym.create_dataset('composite', size=10, seed=42, datasets=specs)
For the simplest way to get started training models with Reasoning Gym, we recommend using the `verifiers` library, which directly supports RG tasks. See `examples/verifiers` for details. However, RG data can be used with any major RL training framework.
The *cascade scorer* applies progressively lenient fallback matchers — string, numeric, and symbolic math — to reduce false negatives from formatting differences (LaTeX wrappers, casing, numeric representation). Install with `pip install reasoning-gym[scoring]` for symbolic math verification.
```python
from reasoning_gym import cascade_score
assert cascade_score(answer=r"\text{42}", expected="42") == 1.0
```
## 🔍 Evaluation
Instructions for running the evaluation scripts are provided in [eval/README.md](https://github.com/open-thought/reasoning-gym/blob/main/eval/README.md).