Merge pull request #192 from rnkrtt/main

Fix typo in author name Gurning -> Gurung in community README
This commit is contained in:
dmahan93 2025-06-23 10:15:58 -05:00 committed by GitHub
commit 5b2b5e9947
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -293,7 +293,7 @@ A unique environment for training LLMs to express needs and desires through auth
**Author**: [JakeBoggs](https://github.com/JakeBoggs)
**Purpose**: Train LLMs to generate humorous punchlines using Verifiable Rewards via Completion Likelihood Improvement (VR-CLI)
A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurning & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline.
A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurung & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline.
**Features**:
- **VR-CLI Methodology**: Uses Verifiable Rewards via Completion Likelihood Improvement for reduced overfitting