remvoed merge error

2026-04-23 16:54:56 +00:00 · 2025-05-16 19:49:37 -07:00 · 2025-05-16 19:49:37 -07:00 · 41caa05a1a
commit 41caa05a1a
parent 9753d5a122
1 changed files with 1 additions and 3 deletions
--- a/environments/README.md
+++ b/environments/README.md
@ -241,6 +241,7 @@ Two Blackjack environment implementations are provided. For more details, see th
    - **Gameplay:** A more complex version designed for agents that produce long interaction sequences, including "thinking" steps.
    - **Features:** Windowed decision making, local alternative generation, value-based pruning, and counterfactual data for training (GRPO).
    - **Use Case:** Ideal for training LLMs that engage in explicit multi-step reasoning before action. Teaches the model to be more "confident" about selecting optimal moves & taking informed risks in uncertain environments, even with the knowledge that it might still lose with optimal play.
+
 ### Instruction Following Environment (`instruction_following_algorithm_environment.py`)

 **Dependencies:**
@ -252,9 +253,6 @@ This environment was inspired by AllenAI's RLVR-IFEVAL environment and uses Alle
 - Paper: https://arxiv.org/abs/2411.15124

 Environment for training models to follow natural language instructions and constraints, based on the `allenai/RLVR-IFeval` dataset and environment.
-**Dependencies:**
- `datasets` (Hugging Face)
- `langdetect`

 **Input Format:**
 - Each item from the processed `allenai/RLVR-IFeval` dataset contains: