mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-28 17:29:30 +00:00
Merge pull request #192 from rnkrtt/main
Fix typo in author name Gurning -> Gurung in community README
This commit is contained in:
commit
5b2b5e9947
1 changed files with 1 additions and 1 deletions
|
|
@ -293,7 +293,7 @@ A unique environment for training LLMs to express needs and desires through auth
|
|||
**Author**: [JakeBoggs](https://github.com/JakeBoggs)
|
||||
**Purpose**: Train LLMs to generate humorous punchlines using Verifiable Rewards via Completion Likelihood Improvement (VR-CLI)
|
||||
|
||||
A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurning & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline.
|
||||
A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurung & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline.
|
||||
|
||||
**Features**:
|
||||
- **VR-CLI Methodology**: Uses Verifiable Rewards via Completion Likelihood Improvement for reduced overfitting
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue