From af1c98d7a8bab2c2f4811dfadecdae30333a5b64 Mon Sep 17 00:00:00 2001 From: Merkel Tranjes <140164174+rnkrtt@users.noreply.github.com> Date: Mon, 23 Jun 2025 16:23:02 +0200 Subject: [PATCH] Update README.md --- environments/community/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environments/community/README.md b/environments/community/README.md index a622afc6..e139dc38 100644 --- a/environments/community/README.md +++ b/environments/community/README.md @@ -293,7 +293,7 @@ A unique environment for training LLMs to express needs and desires through auth **Author**: [JakeBoggs](https://github.com/JakeBoggs) **Purpose**: Train LLMs to generate humorous punchlines using Verifiable Rewards via Completion Likelihood Improvement (VR-CLI) -A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurning & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline. +A specialized environment for training LLMs to understand humor by generating joke punchlines through a novel RL technique from the paper "Learning to Reason for Long-Form Story Generation" (Gurung & Lapata, 2025). The environment teaches models to first generate reasoning that leads to good punchlines, with rewards based on how much the reasoning improves the likelihood of the actual punchline. **Features**: - **VR-CLI Methodology**: Uses Verifiable Rewards via Completion Likelihood Improvement for reduced overfitting