diff --git a/environments/community/README.md b/environments/community/README.md
index 4af21a9c..e389ad8e 100644
--- a/environments/community/README.md
+++ b/environments/community/README.md
@@ -508,6 +508,80 @@ Rejected: "Okay."
**Requirements**: Standard Atropos dependencies, transformers, torch
+### 14. Solitaire Winning Probability Environment (`solitaire_winning_probability/`)
+**Author**: [davidedipeppe](https://github.com/davidedipeppe)
+**Purpose**: Train LLMs to analyze and predict winning probabilities in solitaire-style card games using both theoretical mathematics and empirical simulation
+
+A sophisticated environment that combines AI-powered probability analysis with Monte Carlo simulation to teach LLMs mathematical reasoning about game theory and probability. Models learn to derive mathematical formulas for exact probability calculations and validate their theoretical predictions through empirical simulation.
+
+**Features**:
+- **Dual Analysis Approach**: Both theoretical mathematical formulas and empirical Monte Carlo simulation
+- **AI Formula Derivation**: LLMs analyze game mechanics to derive exact probability formulas
+- **Mathematical Expression Evaluation**: Supports factorials, combinations, permutations, and standard operations
+- **Simulation Verification**: Runs thousands of game simulations to verify theoretical predictions
+- **QA Dataset Generation**: Creates training data for AI models by generating question-answer pairs
+- **Sophisticated Reward Function**: Evaluates prediction quality with relative error calculation and length penalties
+
+**Game Types Included**:
+- **Easy Probability Games**: Simple card draws and dice rolls (1/4, 1/6, 1/4 probabilities)
+- **Card Matching Games**: Avoid counter-card matches with cycling counters (1-4 cycles)
+- **Odd Card Game**: Draw odd-valued cards from standard deck (7/13 probability)
+- **Extensible Framework**: Easy to add new solitaire game variants
+
+**Mathematical Framework**:
+- **Formula Notation**: Supports `C(n,r)` combinations, `P(n,r)` permutations, `factorial(n)`
+- **Expression Parser**: Safe mathematical expression evaluation with asteval
+- **Probability Comparison**: Measures theoretical vs empirical accuracy
+- **Error Analysis**: Quantifies prediction quality with relative error metrics
+
+**Reward System Design**:
+1. **Base Reward**: `1 - min(abs(gt - predicted) / gt, 2)` with 0.2 bonus for valid predictions
+2. **Length Penalty**: Applied to responses exceeding 50% of max token length
+3. **Validation Checks**: Ensures proper formula formatting and mathematical syntax
+4. **Quality Metrics**: Tracks prediction accuracy and response efficiency
+
+**Training Components**:
+- **Game Predictor Class**: Core AI analysis and formula evaluation engine
+- **Simulation Engine**: Monte Carlo verification with configurable iteration counts
+- **Mathematical Evaluator**: Safe expression parsing and computation
+- **QA Data Generator**: Automated training dataset creation
+
+**Example Training Flow**:
+```
+Game: Draw from [1,2,3,4], win if card is 1
+AI Analysis: "1 favorable outcome out of 4 total..."
+Formula: "1/4"
+Calculated: 0.25
+Simulated: 0.2499 (100k runs)
+Reward: High (excellent theoretical-empirical match)
+```
+
+**Applications**:
+- **Probability Theory Education**: Practical demonstration of theoretical concepts
+- **Mathematical Reasoning Training**: Formula derivation and validation skills
+- **Game Analysis Research**: Framework for analyzing card game mechanics
+- **AI Math Capabilities**: Training models in structured mathematical thinking
+
+**Technical Implementation**:
+- **AsyncOpenAI Integration**: Efficient AI analysis with configurable models
+- **CSV Data Management**: Structured question-answer pair storage
+- **Comprehensive Error Handling**: Robust formula evaluation and validation
+- **Performance Tracking**: Detailed analysis results and comparison metrics
+
+**Quality Assessment**:
+- **Excellent Match**: < 1% difference between theory and simulation
+- **Good Match**: < 5% difference
+- **Fair Match**: < 10% difference
+- **Poor Match**: > 10% difference
+
+**Configuration Options**:
+- Simulation count (default: 100,000 runs)
+- Model selection for AI analysis
+- Token length limits and penalties
+- Mathematical expression validation rules
+
+**Requirements**: asyncio, openai, asteval, csv, datasets, math_verify, latex2sympy2_extended
+
---
## Support
diff --git a/environments/community/solitaire_winning_probability/README.md b/environments/community/solitaire_winning_probability/README.md
new file mode 100644
index 00000000..22706f0a
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/README.md
@@ -0,0 +1,111 @@
+# Solitaire Winning Probability Environment
+
+This environment is designed to analyze and predict winning probabilities in various solitaire-style games using both theoretical mathematical analysis and empirical simulation.
+
+## Overview
+
+The system combines two approaches to determine game winning probabilities:
+1. **Theoretical Analysis**: Uses AI to derive mathematical formulas for exact probability calculations
+2. **Empirical Simulation**: Runs Monte Carlo simulations to verify theoretical predictions
+
+## Key Components
+
+### GamePredictor Class
+The core component that handles:
+- AI-powered probability analysis
+- Mathematical formula evaluation
+- Game simulation
+- Probability comparison between theoretical and empirical results
+
+### Features
+
+- **AI Analysis**: Uses LLM to analyze game mechanics and derive mathematical formulas
+- **Formula Evaluation**: Supports complex mathematical expressions including:
+ - Factorials
+ - Combinations (C(n,r))
+ - Permutations (P(n,r))
+ - Standard mathematical operations
+- **Simulation Engine**: Runs multiple game simulations to verify theoretical predictions
+- **QA Dataset Generation**: Creates training data for AI models by generating question-answer pairs
+
+### Reward Function
+
+The environment implements a sophisticated reward function that evaluates the quality of probability predictions:
+
+1. **Base Reward Calculation**:
+ - Compares the predicted probability with the ground truth probability
+ - Calculates the relative error: `1 - min(abs(gt - predicted) / gt, 2)`
+ - Adds a small bonus of 0.2 for valid predictions
+ - Clips the final reward between -1 and 1
+
+2. **Length Penalty**:
+ - Applies a length-based penalty for responses that exceed 50% of the maximum token length
+ - No penalty for responses under the threshold
+ - Linear scaling of penalty based on response length
+ - Helps encourage concise and efficient solutions
+
+3. **Validation Checks**:
+ - Verifies proper formula formatting and syntax
+ - Ensures responses contain valid mathematical expressions
+ - Handles edge cases and invalid responses gracefully
+
+4. **Quality Metrics**:
+ - Tracks percentage of correct predictions
+ - Monitors response lengths and quality
+ - Provides feedback for model improvement
+
+## Usage
+
+```python
+# Initialize the predictor
+predictor = GamePredictor(openai_api_key, openai_api_base)
+
+# Define games to analyze
+games = {
+ 'game_name': game_function,
+ # ... more games
+}
+
+# Get predictions for all games
+results = await predictor.predict_games(games)
+
+# Generate QA dataset
+await predictor.generate_qa_csv(games, n_simulations, "output.csv")
+```
+
+## Output Format
+
+The system provides comprehensive analysis for each game:
+- AI's mathematical reasoning
+- Derived probability formula
+- Calculated theoretical probability
+- Simulated empirical probability
+- Comparison assessment between theory and simulation
+
+## Supported Games
+
+The environment includes several example games:
+- Easy games (1-4)
+- Card matching games (2-4 cards)
+- Odd card game
+
+## Requirements
+
+- Python 3.x
+- OpenAI API access
+- Required packages:
+ - openai
+ - asteval
+ - asyncio
+
+## Purpose
+
+This environment serves multiple purposes:
+1. Educational: Demonstrates probability theory in practical game scenarios
+2. Research: Provides a framework for analyzing game mechanics
+3. AI Training: Generates datasets for training AI models in probability analysis
+4. Verification: Validates theoretical probability calculations through simulation
+
+## Contributing
+
+New games can be added by implementing game functions that return a boolean indicating win/loss. The system will automatically analyze and provide probability predictions for any valid game implementation.
diff --git a/environments/community/solitaire_winning_probability/game_predictor.py b/environments/community/solitaire_winning_probability/game_predictor.py
new file mode 100644
index 00000000..8acefec6
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/game_predictor.py
@@ -0,0 +1,404 @@
+import asyncio
+import csv
+import inspect
+import math
+import re
+from dataclasses import dataclass
+from typing import Callable, Dict, Optional, Tuple
+
+from asteval import Interpreter
+from games import (
+ card_matching_game_2,
+ card_matching_game_3,
+ card_matching_game_4,
+ easy_game_1,
+ easy_game_2,
+ easy_game_3,
+ easy_game_4,
+ odd_card_game,
+)
+from openai import AsyncOpenAI
+
+GUIDELINES = """
+Please provide your analysis using the exact format below, including all tags:
+
+
+[Your initial approach to solving this probability problem]
+[List important observations about the game mechanics]
+[Show your step-by-step mathematical derivation using probability theory]
+[Include explanations of any combinations, permutations, or conditional probabilities used]
+
+
+
+[IMPORTANT: Write ONLY the final, simplified mathematical formula for the probability of winning below.]
+[CRITICAL: Do NOT include any text, explanations, comments, multiple formulas,
+or intermediate calculation steps within this tag.]
+[CRITICAL: If a precise mathematical formula cannot be determined, leave this section EMPTY.]
+[Use C(n,r), P(n,r), factorial(n) and standard math operators: + - * / ^ ( ) ]
+
+
+Note: Use these notations ONLY in your formula:
+- Factorial: factorial(n)
+- Combinations: C(n,r)
+- Permutations: P(n,r)
+- Standard operators: *, /, +, -, ^, (, )
+The formula must be in a format that can be directly evaluated.
+Use parentheses liberally to ensure correct order of operations. For example,
+write (A * B) / (C * D) instead of A * B / C * D if you intend the division
+to apply to the result of (C * D). Be explicit!
+
+What is the mathematical formula to calculate the exact probability of winning this game?
+"""
+
+
+@dataclass
+class GameAnalysis:
+ """Class to hold the analysis results of a game."""
+
+ ai_analysis: str
+ formula: Optional[str]
+ calculated_probability: Optional[float]
+ simulated_probability: float
+ n_simulations: int
+ probability_difference: Optional[float]
+
+
+class GamePredictor:
+ def __init__(
+ self,
+ openai_api_key: str,
+ openai_api_base: str,
+ model: str = "llama-4-maverick-17b-128e-instruct-fp8",
+ ):
+ """Initialize the GamePredictor with OpenAI API credentials."""
+ self.client = AsyncOpenAI(
+ api_key=openai_api_key,
+ base_url=openai_api_base,
+ )
+ self.model = model
+ # Create a persistent asteval interpreter
+ self.aeval = Interpreter()
+ # Add math functions to the interpreter's symbol table
+ self.aeval.symtable["factorial"] = self.factorial
+ self.aeval.symtable["C"] = self.combination
+ self.aeval.symtable["P"] = self.permutation
+ # Add standard math functions if needed (optional, asteval includes many)
+ # self.aeval.symtable['sqrt'] = math.sqrt
+ # self.aeval.symtable['pow'] = math.pow
+
+ @staticmethod
+ def factorial(n: int) -> int:
+ """Calculate factorial."""
+ return math.factorial(n)
+
+ @staticmethod
+ def combination(n: int, r: int) -> int:
+ """Calculate combination nCr."""
+ return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))
+
+ @staticmethod
+ def permutation(n: int, r: int) -> int:
+ """Calculate permutation nPr."""
+ return math.factorial(n) // math.factorial(n - r)
+
+ def _extract_formula(self, response_text: str) -> Optional[str]:
+ """Extract formula from the AI response."""
+ # Find all formula blocks
+ formula_matches = re.findall(
+ r"\n(.*?)\n", response_text, re.DOTALL
+ )
+
+ if not formula_matches:
+ return None
+
+ # Use the content of the last formula block found
+ last_formula_content = formula_matches[-1].strip()
+
+ if not last_formula_content:
+ return None
+
+ # Split the content into lines and filter out empty lines
+ lines = [
+ line.strip() for line in last_formula_content.split("\\n") if line.strip()
+ ]
+
+ if not lines:
+ return None
+
+ # Return the last non-empty line as the potential formula
+ # This assumes the AI puts the final, clean formula last in the block
+ return lines[-1]
+
+ def _evaluate_formula(self, formula: str) -> float:
+ """Evaluate the mathematical formula using asteval."""
+ # No need for regex replacements if the AI uses C(), P(), factorial()
+ try:
+ # Evaluate the formula using the pre-configured interpreter
+ result = self.aeval(formula)
+ if isinstance(result, (int, float)):
+ return float(result)
+ else:
+ # Explicitly handle non-numeric results
+ raise ValueError(
+ f"Formula '{formula}' evaluated to non-numeric type: {type(result).__name__} ({result})"
+ )
+ except KeyError as e:
+ # Handle cases where symbols are not found (e.g., undefined variables in formula)
+ raise ValueError(
+ f"Error evaluating formula '{formula}': Undefined symbol {e}"
+ )
+ except Exception as e:
+ # Catch other potential evaluation errors from asteval
+ raise ValueError(f"Error evaluating formula '{formula}' using asteval: {e}")
+
+ def _create_prompt(self, game_func: Callable) -> str:
+ """Create the prompt for the AI model."""
+ # Get the function's source code and docstring
+ source_code = inspect.getsource(game_func)
+ description = game_func.__doc__ or "No description available."
+
+ return f"""
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ {source_code}
+ ```
+
+ {description}
+ """
+
+ def simulate_game(self, game_func: Callable, n_simulations: int = 100000) -> float:
+ """
+ Simulate a game multiple times and return the win probability.
+
+ Args:
+ game_func: The game function to simulate
+ n_simulations: Number of simulations to run
+
+ Returns:
+ float: The probability of winning the game based on simulation
+ """
+ wins = sum(1 for _ in range(n_simulations) if game_func())
+ return wins / n_simulations
+
+ def compare_probabilities(
+ self, calculated: float, simulated: float
+ ) -> Tuple[float, str]:
+ """
+ Compare calculated and simulated probabilities.
+
+ Args:
+ calculated: The probability calculated from the formula
+ simulated: The probability obtained from simulation
+
+ Returns:
+ Tuple[float, str]: The absolute difference and a qualitative assessment
+ """
+ diff = abs(calculated - simulated)
+
+ if diff < 0.01:
+ assessment = "Excellent match between theory and simulation"
+ elif diff < 0.05:
+ assessment = "Good match between theory and simulation"
+ elif diff < 0.1:
+ assessment = "Fair match between theory and simulation"
+ else:
+ assessment = "Poor match between theory and simulation"
+
+ return diff, assessment
+
+ async def predict_game(
+ self, game_func: Callable, n_simulations: int = 100000
+ ) -> GameAnalysis:
+ """
+ Predict the probability of winning a game using both AI analysis and simulation.
+
+ Args:
+ game_func: Function that implements the game
+ n_simulations: Number of simulations to run for verification
+
+ Returns:
+ GameAnalysis object containing all analysis results
+ """
+ # Create and send message to AI
+ message = {
+ "role": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": self._create_prompt(game_func) + "\n\n" + GUIDELINES,
+ },
+ ],
+ }
+
+ try:
+ chat_response = await self.client.chat.completions.create(
+ model=self.model,
+ messages=[message],
+ temperature=0, # Set temperature to 0 for deterministic output
+ )
+ response_text = chat_response.choices[0].message.content
+ except Exception as e:
+ # Handle potential API errors gracefully
+ response_text = f"API call failed: {e}"
+ # Consider logging the error here
+ print(f"Warning: API call failed for a game: {e}")
+
+ # Extract and evaluate formula
+ formula = self._extract_formula(response_text)
+ calculated_prob = None
+ if formula:
+ try:
+ calculated_prob = self._evaluate_formula(formula)
+ except ValueError as e:
+ # Formula evaluation failed, set probability to None and maybe log/store the error
+ calculated_prob = None
+ # Optionally add error information to the analysis object if needed
+ print(f"Warning: Could not evaluate formula '{formula}': {e}")
+ # Update response_text or add a field to GameAnalysis if needed
+ response_text += f"Formula evaluation failed: {e}"
+
+ # Run simulation in a separate thread to avoid blocking the event loop
+ simulated_prob = await asyncio.to_thread(
+ self.simulate_game, game_func, n_simulations
+ )
+
+ # Calculate difference if both probabilities are available
+ prob_diff = (
+ abs(calculated_prob - simulated_prob)
+ if calculated_prob is not None
+ else None
+ )
+
+ return GameAnalysis(
+ ai_analysis=response_text,
+ formula=formula,
+ calculated_probability=calculated_prob,
+ simulated_probability=simulated_prob,
+ n_simulations=n_simulations,
+ probability_difference=prob_diff,
+ )
+
+ async def predict_games(
+ self, games: Dict[str, Callable], n_simulations: int = 100000
+ ) -> Dict[str, GameAnalysis]:
+ """
+ Predict probabilities for multiple games concurrently.
+
+ Args:
+ games: Dictionary mapping game names to game functions
+ n_simulations: Number of simulations per game
+
+ Returns:
+ Dictionary mapping game names to their GameAnalysis results
+ """
+ # Create tasks for each game prediction
+ tasks = {
+ game_name: asyncio.create_task(self.predict_game(game_func, n_simulations))
+ for game_name, game_func in games.items()
+ }
+
+ # Wait for all tasks to complete
+ await asyncio.gather(*tasks.values())
+
+ # Collect results
+ results = {name: task.result() for name, task in tasks.items()}
+ return results
+
+ async def generate_qa_csv(
+ self, games: Dict[str, Callable], n_simulations: int, csv_filepath: str
+ ):
+ """
+ Generates a CSV file with questions (prompts) and answers (simulated probabilities).
+
+ Args:
+ games: Dictionary mapping game names to game functions.
+ n_simulations: Number of simulations per game.
+ csv_filepath: Path to save the CSV file.
+ """
+ qa_data = []
+ # Create a list of tasks for simulation to run them concurrently if desired,
+ # or simply iterate and await if sequential processing per game is fine.
+ # For simplicity here, we'll process game simulations sequentially for prompt generation,
+ # but the simulation itself runs in a thread.
+ for game_name, game_func in games.items():
+ print(f"Processing game for CSV: {game_name}")
+ prompt = self._create_prompt(game_func)
+
+ # simulate_game is synchronous, run it in a thread to avoid blocking
+ simulated_prob = await asyncio.to_thread(
+ self.simulate_game, game_func, n_simulations
+ )
+ answer = f"{simulated_prob:.6f}" # Format probability as string
+
+ qa_data.append({"question": prompt, "answer": answer})
+
+ try:
+ with open(csv_filepath, "w", newline="", encoding="utf-8") as csvfile:
+ fieldnames = ["question", "answer"]
+ writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
+
+ writer.writeheader()
+ for row_data in qa_data:
+ writer.writerow(row_data)
+ print(f"Successfully generated Q&A CSV at {csv_filepath}")
+ except IOError as e:
+ print(f"Error writing CSV file {csv_filepath}: {e}")
+ except Exception as e:
+ print(f"An unexpected error occurred during CSV generation: {e}")
+
+
+# Example usage:
+async def main():
+ # API credentials - Set these as environment variables or pass as parameters
+ openai_api_key = "your_openai_api_key_here"
+ openai_api_base = "https://api.lambda.ai/v1"
+
+ # Create predictor instance
+ predictor = GamePredictor(openai_api_key, openai_api_base)
+
+ # Define games to analyze
+ games = {
+ "easy_game_1": easy_game_1,
+ "easy_game_2": easy_game_2,
+ "easy_game_3": easy_game_3,
+ "easy_game_4": easy_game_4,
+ "card_matching_2": card_matching_game_2,
+ "card_matching_3": card_matching_game_3,
+ "card_matching_4": card_matching_game_4,
+ "odd_card": odd_card_game,
+ }
+ await predictor.generate_qa_csv(
+ games,
+ 100000,
+ "environments/community/solitaire_winning_probability/qa_data.csv",
+ )
+ # Get predictions for all games
+ results = await predictor.predict_games(games)
+
+ # Print results
+ for game_name, analysis in results.items():
+ print(f"\nResults for {game_name}:")
+ # print("AI Analysis:")
+ # print(analysis.ai_analysis)
+ # print(f"\nFormula: {analysis.formula}")
+ # # Handle potential None for calculated probability
+ # if analysis.calculated_probability is not None:
+ # print(f"Calculated probability: {analysis.calculated_probability:.4f}")
+ # else:
+ # print("Calculated probability: N/A (Formula missing or invalid)")
+ print(f"Simulated probability: {analysis.simulated_probability:.4f}")
+ # Compare only if calculated probability is available
+ if analysis.calculated_probability is not None:
+ diff, assessment = predictor.compare_probabilities(
+ analysis.calculated_probability, analysis.simulated_probability
+ )
+ print(f"Probability difference: {diff:.4f}")
+ print(f"Assessment: {assessment}")
+ else:
+ print("Probability difference: N/A")
+ print("Assessment: N/A (Cannot compare without calculated probability)")
+
+
+if __name__ == "__main__":
+ asyncio.run(main())
diff --git a/environments/community/solitaire_winning_probability/games.py b/environments/community/solitaire_winning_probability/games.py
new file mode 100644
index 00000000..5b9230d7
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/games.py
@@ -0,0 +1,170 @@
+import random
+
+
+def easy_game_1():
+ """
+ Draw a card from a deck of 4 cards (1, 2, 3, 4). Win if the card is 1.
+ """
+ deck = [1, 2, 3, 4]
+ random.shuffle(deck)
+ return deck[0] == 1
+
+
+def easy_game_2():
+ """
+ Roll a 6-sided die. Win if the result is 1.
+ """
+ return random.randint(1, 6) == 1
+
+
+def easy_game_3():
+ """
+ Flip a coin twice. Win if both are heads. (Assuming 0 for heads, 1 for tails)
+ """
+ flip1 = random.randint(0, 1) # 0 for heads, 1 for tails
+ flip2 = random.randint(0, 1)
+ return flip1 == 0 and flip2 == 0
+
+
+def easy_game_4():
+ """
+ Draw a card from a deck of 3 cards (1, 2, 3). Win if the card is 1.
+ """
+ deck = [1, 2, 3]
+ random.shuffle(deck)
+ return deck[0] == 1
+
+
+def card_matching_game_1():
+ """
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,1,1,1,1,1,1,1,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """
+ # Create and shuffle deck
+ ranks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ deck = [rank for rank in ranks for _ in range(1)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 1) + 1
+ if count == card:
+ return False
+ return True
+
+
+def card_matching_game_2():
+ """
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,1,2,1,2,1,2,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """
+ # Create and shuffle deck
+ ranks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 2) + 1
+ if count == card:
+ return False
+ return True
+
+
+def card_matching_game_3():
+ """
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,1,2,3,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """
+ # Create and shuffle deck
+ ranks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 3) + 1
+ if count == card:
+ return False
+ return True
+
+
+def card_matching_game_4():
+ """
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,4,1,2,3,4,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """
+ # Create and shuffle deck
+ ranks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 4) + 1
+ if count == card:
+ return False
+ return True
+
+
+def odd_card_game():
+ """
+ A game where we win if we draw an odd-valued card from a deck.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We draw one card randomly from the deck
+ 4. We win if the card value is odd, lose if it's even
+
+ Returns:
+ bool: True if won (odd card drawn), False if lost (even card drawn)
+ """
+ # Create and shuffle deck
+ ranks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Draw one card and check if it's odd
+ drawn_card = deck[0]
+ return drawn_card % 2 == 1
diff --git a/environments/community/solitaire_winning_probability/main.py b/environments/community/solitaire_winning_probability/main.py
new file mode 100644
index 00000000..3db9d581
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/main.py
@@ -0,0 +1,9 @@
+import pandas as pd
+
+splits = {
+ "train": "main/train-00000-of-00001.parquet",
+ "test": "main/test-00000-of-00001.parquet",
+}
+df = pd.read_parquet("hf://datasets/openai/gsm8k/" + splits["train"])
+df.to_csv("local_data.csv", index=False)
+print(df.head())
diff --git a/environments/community/solitaire_winning_probability/qa_data.csv b/environments/community/solitaire_winning_probability/qa_data.csv
new file mode 100644
index 00000000..e187c940
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/qa_data.csv
@@ -0,0 +1,253 @@
+question,answer
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def easy_game_1():
+ """"""
+ Draw a card from a deck of 4 cards (1, 2, 3, 4). Win if the card is 1.
+ """"""
+ deck = [1, 2, 3, 4]
+ random.shuffle(deck)
+ return deck[0] == 1
+
+ ```
+
+
+ Draw a card from a deck of 4 cards (1, 2, 3, 4). Win if the card is 1.
+
+ ",0.249390
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def easy_game_2():
+ """"""
+ Roll a 6-sided die. Win if the result is 1.
+ """"""
+ return random.randint(1, 6) == 1
+
+ ```
+
+
+ Roll a 6-sided die. Win if the result is 1.
+
+ ",0.168080
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def easy_game_3():
+ """"""
+ Flip a coin twice. Win if both are heads. (Assuming 0 for heads, 1 for tails)
+ """"""
+ flip1 = random.randint(0, 1) # 0 for heads, 1 for tails
+ flip2 = random.randint(0, 1)
+ return flip1 == 0 and flip2 == 0
+
+ ```
+
+
+ Flip a coin twice. Win if both are heads. (Assuming 0 for heads, 1 for tails)
+
+ ",0.249860
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def easy_game_4():
+ """"""
+ Draw a card from a deck of 3 cards (1, 2, 3). Win if the card is 1.
+ """"""
+ deck = [1, 2, 3]
+ random.shuffle(deck)
+ return deck[0] == 1
+
+ ```
+
+
+ Draw a card from a deck of 3 cards (1, 2, 3). Win if the card is 1.
+
+ ",0.332730
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def card_matching_game_2():
+ """"""
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,1,2,1,2,1,2,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """"""
+ # Create and shuffle deck
+ ranks = [1,2,3,4,5,6,7,8,9,10,11,12,13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 2) + 1
+ if count == card:
+ return False
+ return True
+
+ ```
+
+
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,1,2,1,2,1,2,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+
+ ",0.004120
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def card_matching_game_3():
+ """"""
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,1,2,3,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """"""
+ # Create and shuffle deck
+ ranks = [1,2,3,4,5,6,7,8,9,10,11,12,13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 3) + 1
+ if count == card:
+ return False
+ return True
+
+ ```
+
+
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,1,2,3,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+
+ ",0.008230
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def card_matching_game_4():
+ """"""
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,4,1,2,3,4,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+ """"""
+ # Create and shuffle deck
+ ranks = [1,2,3,4,5,6,7,8,9,10,11,12,13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Play game
+ count = 0
+ for card in deck:
+ count = (count % 4) + 1
+ if count == card:
+ return False
+ return True
+
+ ```
+
+
+ A game where we lose if the counter matches the card value.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We deal cards one by one, keeping a counter that cycles 1,2,3,4,1,2,3,4,...
+ 4. We lose if the counter matches the card value
+ 5. We win if we go through the whole deck without any matches
+
+ Returns:
+ bool: True if won, False if lost
+
+ ",0.010560
+"
+ Analyze this game implemented in the following Python code:
+
+ ```python
+ def odd_card_game():
+ """"""
+ A game where we win if we draw an odd-valued card from a deck.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We draw one card randomly from the deck
+ 4. We win if the card value is odd, lose if it's even
+
+ Returns:
+ bool: True if won (odd card drawn), False if lost (even card drawn)
+ """"""
+ # Create and shuffle deck
+ ranks = [1,2,3,4,5,6,7,8,9,10,11,12,13]
+ deck = [rank for rank in ranks for _ in range(4)]
+ random.shuffle(deck)
+
+ # Draw one card and check if it's odd
+ drawn_card = deck[0]
+ return drawn_card % 2 == 1
+
+ ```
+
+
+ A game where we win if we draw an odd-valued card from a deck.
+
+ Rules:
+ 1. We have a standard deck where each rank (1-13) appears 4 times (52 cards total)
+ 2. Cards are shuffled randomly
+ 3. We draw one card randomly from the deck
+ 4. We win if the card value is odd, lose if it's even
+
+ Returns:
+ bool: True if won (odd card drawn), False if lost (even card drawn)
+
+ ",0.536680
diff --git a/environments/community/solitaire_winning_probability/solitaire_server.py b/environments/community/solitaire_winning_probability/solitaire_server.py
new file mode 100644
index 00000000..e38f02f8
--- /dev/null
+++ b/environments/community/solitaire_winning_probability/solitaire_server.py
@@ -0,0 +1,413 @@
+import csv # Added import for CSV handling
+import random
+from typing import Dict, List, Optional, Tuple, TypedDict, Union
+
+from asteval import Interpreter
+from latex2sympy2_extended import NormalizationConfig
+from math_verify import LatexExtractionConfig, parse, verify
+from tqdm.asyncio import tqdm_asyncio
+
+from atroposlib.envs.base import (
+ APIServerConfig,
+ BaseEnv,
+ BaseEnvConfig,
+ ScoredDataGroup,
+)
+from atroposlib.type_definitions import Item, number
+from atroposlib.utils.tokenize_for_trainer import tokenize_for_trainer
+
+aeval = Interpreter()
+
+system_prompt = """
+Please provide your analysis using the exact format below, including all tags:
+
+
+[Your initial approach to solving this probability problem]
+[List important observations about the game mechanics]
+[Show your step-by-step mathematical derivation using probability theory]
+[Include explanations of any combinations, permutations, or conditional probabilities used]
+
+
+
+[IMPORTANT: Write ONLY the final, simplified mathematical formula for the probability of winning below.]
+[CRITICAL: Do NOT include any text, explanations, comments, multiple formulas,
+or intermediate calculation steps within this tag.]
+[CRITICAL: If a precise mathematical formula cannot be determined, leave this section EMPTY.]
+[Use C(n,r), P(n,r), factorial(n) and standard math operators: + - * / ^ ( ) ]
+
+
+Note: Use these notations ONLY in your formula:
+- Factorial: factorial(n)
+- Combinations: C(n,r)
+- Permutations: P(n,r)
+- Standard operators: *, /, +, -, ^, (, )
+The formula must be in a format that can be directly evaluated.
+Use parentheses liberally to ensure correct order of operations. For example,
+write (A * B) / (C * D) instead of A * B / C * D if you intend the division
+to apply to the result of (C * D). Be explicit!
+
+What is the mathematical formula to calculate the exact probability of winning this game?
+"""
+
+system_prompt += """You are allocated a maximum of 2048 tokens, please strive to use less.
+
+You will then provide your answer like this: \\boxed{your answer here}
+It is important that you provide your answer in the correct format.
+If you do not, you will not receive credit for your answer.
+So please end your answer with \\boxed{your answer here}"""
+
+
+class SolitaireRow(TypedDict):
+ question: str
+ answer: str
+
+
+class SolitaireEnv(BaseEnv):
+
+ name = "solitaire_winning_probability"
+
+ def __init__(
+ self,
+ config: BaseEnvConfig,
+ server_configs: List[APIServerConfig],
+ slurm=True,
+ testing=False,
+ ):
+ super().__init__(config, server_configs, slurm, testing)
+ self.percent_correct_buffer = list()
+ self.eval_metrics = list()
+ # Add tracking for wandb visualizations
+ self.rollouts_for_wandb = []
+ self.completion_lengths = []
+
+ @classmethod
+ def config_init(cls) -> Tuple[BaseEnvConfig, List[APIServerConfig]]:
+ env_config = BaseEnvConfig(
+ tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview",
+ group_size=8,
+ use_wandb=True,
+ rollout_server_url="http://localhost:8000",
+ total_steps=1000,
+ batch_size=12,
+ steps_per_eval=100,
+ max_token_length=2048,
+ wandb_name="solitaire_winning_probability",
+ )
+ server_configs = [
+ APIServerConfig(
+ model_name="gpt-4.1-nano",
+ base_url="https://api.openai.com/v1",
+ api_key="x",
+ num_requests_for_eval=256,
+ ),
+ ]
+
+ return env_config, server_configs
+
+ async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
+ if wandb_metrics is None:
+ wandb_metrics = {}
+
+ # Try to calculate percent_correct, pass if there's a division by zero
+ try:
+ wandb_metrics["train/percent_correct"] = sum(
+ self.percent_correct_buffer
+ ) / len(self.percent_correct_buffer)
+ except ZeroDivisionError:
+ # Skip if buffer is empty
+ pass
+
+ self.percent_correct_buffer = list()
+ for item in self.eval_metrics:
+ wandb_metrics[item[0]] = item[1]
+ self.eval_metrics = list()
+ # Call the parent method to handle the server metrics
+ await super().wandb_log(wandb_metrics)
+
+ async def setup(self):
+ # Load data from a local CSV file
+ self.train = []
+ # Load data from qa_data.csv in the same directory as this environment
+ csv_file_path = (
+ "environments/community/solitaire_winning_probability/" "qa_data.csv"
+ )
+ try:
+ with open(csv_file_path, mode="r", encoding="utf-8") as file:
+ reader = csv.DictReader(file)
+ for row in reader:
+ # Ensure 'question' and 'answer' columns exist
+ if "question" in row and "answer" in row:
+ self.train.append(
+ {
+ "question": row["question"],
+ "answer": row[
+ "answer"
+ ], # Assuming 'answer' in CSV is already in the desired format
+ }
+ )
+ else:
+ print(
+ f"Warning: Skipping row due to missing "
+ f"'question' or 'answer': {row}"
+ )
+ if not self.train:
+ print(
+ f"Warning: No data loaded from {csv_file_path}. "
+ f"Ensure the file exists and has 'question' and 'answer' columns."
+ )
+
+ except FileNotFoundError:
+ print(f"Error: The file {csv_file_path} was not found.")
+ # Handle the error as appropriate for your application
+ # For example, raise an exception or exit
+ raise
+ except Exception as e:
+ print(f"An error occurred while reading {csv_file_path}: {e}")
+ raise
+
+ # Shuffle the training data
+ random.Random(42).shuffle(self.train)
+
+ # For the test set, we'll create a dummy one for now or load another CSV.
+ # If you have a separate test CSV, you can load it similarly.
+ # For this example, let's assume the CSV also contains test data or use a subset of train.
+ # Or, if your CSV is purely for training, you might need a different strategy for the test set.
+ self.test = [] # Placeholder for test data
+
+ # Example: Using a small part of the loaded 'train' data as 'test' data.
+ # Adjust this logic based on how your local_data.csv is structured
+ # or if you have a separate test CSV.
+ if len(self.train) > 10: # Ensure there's enough data
+ test_data_raw = self.train[:10] # Taking first 10 as example
+ else:
+ test_data_raw = self.train # Use all if less than 10
+
+ for item in test_data_raw:
+ self.test.append(
+ {
+ "question": item["question"],
+ "gold_answer": item[
+ "answer"
+ ] # Assuming 'answer' in CSV is the final gold answer string
+ .split("#")[-1]
+ .strip()
+ .replace(",", ""),
+ }
+ )
+ self.iter = 0
+
+ def save_checkpoint(self, step, data=None):
+ if data is None:
+ data = {}
+ data["iter"] = self.iter
+ super().save_checkpoint(step, data)
+
+ async def rollout_and_score_eval(self, question: str, answer: str) -> number:
+ completion = await self.server.chat_completion(
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": question},
+ ],
+ n=1,
+ max_tokens=self.config.max_token_length,
+ temperature=0.0,
+ split="eval",
+ )
+ gold_parsed = parse(
+ "\\boxed{" + answer + "}",
+ extraction_mode="first_match",
+ extraction_config=[LatexExtractionConfig()],
+ )
+ answer_parsed = parse(
+ completion.choices[0].message.content.split("")[-1],
+ extraction_config=[
+ LatexExtractionConfig(
+ normalization_config=NormalizationConfig(
+ nits=False,
+ malformed_operators=False,
+ basic_latex=True,
+ equations=True,
+ boxed="all",
+ units=True,
+ ),
+ # Ensures that boxed is tried first
+ boxed_match_priority=0,
+ try_extract_without_anchor=False,
+ )
+ ],
+ extraction_mode="first_match",
+ )
+ score = 1 if verify(answer_parsed, gold_parsed) else 0
+ return score
+
+ async def evaluate(self, *args, **kwargs):
+ eval_tasks = []
+ for item in self.test:
+ eval_tasks.append(
+ self.rollout_and_score_eval(item["question"], item["gold_answer"])
+ )
+ scores = await tqdm_asyncio.gather(*eval_tasks)
+ self.eval_metrics.append(("eval/percent_correct", sum(scores) / len(scores)))
+
+ async def collect_trajectories(
+ self, item: SolitaireRow
+ ) -> Tuple[ScoredDataGroup, list[Item]]:
+ user_message = {"role": "user", "content": item["question"]}
+ gold_answer = (
+ "\\boxed{" + item["answer"].split("#")[-1].strip().replace(",", "") + "}"
+ )
+
+ chat_completions = await self.server.chat_completion(
+ messages=[{"role": "system", "content": system_prompt}, user_message],
+ n=self.config.group_size,
+ max_tokens=self.config.max_token_length,
+ )
+ to_score = list()
+ to_backlog = list()
+ for i, chat_completion in enumerate(chat_completions.choices):
+ messages = (
+ {"role": "system", "content": system_prompt},
+ user_message,
+ {"role": "assistant", "content": chat_completion.message.content},
+ )
+ to_score.append(
+ {
+ "messages": messages,
+ "gold_answer": gold_answer,
+ "finish_reason": chat_completion.finish_reason,
+ }
+ )
+ to_postprocess = await self.score(to_score)
+ return to_postprocess, to_backlog
+
+ async def score(
+ self, rollout_group_data
+ ) -> Union[Optional[ScoredDataGroup], List[Optional[ScoredDataGroup]]]:
+ scores = ScoredDataGroup()
+ scores["tokens"] = list()
+ scores["masks"] = list()
+ scores["scores"] = list()
+ gold_parsed = parse(
+ rollout_group_data[0]["gold_answer"],
+ extraction_mode="first_match",
+ extraction_config=[LatexExtractionConfig()],
+ )
+ if len(gold_parsed) != 0:
+ # We require the answer to be provided in correct latex (no malformed operators)
+ random.shuffle(rollout_group_data)
+ for item in rollout_group_data:
+ reward = -1
+ try:
+ if len(item["messages"][-1]["content"].split("")) < 2:
+ reward = -1
+ continue
+ if (
+ len(
+ item["messages"][-1]["content"]
+ .split("")[1]
+ .split("")
+ )
+ < 1
+ ):
+ reward = -1
+ continue
+ # print(item[0][-1]["content"])
+ answer_parsed = aeval(
+ item["messages"][-1]["content"]
+ .split("")[1]
+ .split("")[0]
+ )
+
+ gt = aeval(item["gold_answer"].split("boxed{")[1].split("}")[0])
+
+ if answer_parsed is not None:
+ # Reward 1 if the content is the same as the ground truth, 0 otherwise
+ reward = 1 - min(abs(gt - answer_parsed) / gt, 2)
+ reward += 0.2
+ else:
+ reward = -1
+ reward = max(-1, reward)
+ reward = min(1, reward)
+ except Exception as e:
+ print(e)
+ reward = -1
+ continue
+
+ out_dict = tokenize_for_trainer(
+ self.tokenizer, item["messages"], item["finish_reason"]
+ )
+ tokens = out_dict["tokens"]
+ masks = out_dict["masks"]
+ # remove obviously bad examples
+ if len([1 for i in masks if i != -100]) < 10:
+ continue
+ scores["tokens"].append(tokens)
+ scores["masks"].append(masks)
+ scores["scores"].append(1.0 if reward else -1.0)
+ if len(scores["tokens"]) >= self.config.group_size:
+ break
+ for score in scores["scores"]:
+ self.percent_correct_buffer.append(max(score, 0))
+ # check if all the same
+ # print(scores['scores'])
+ if all([score == 1 for score in scores["scores"]]):
+ # Do length penalty :)
+ token_lengths = [len(token) for token in scores["tokens"]]
+ if max(token_lengths) == 0:
+ # What? But don't want to crash a run so just in case...
+ return None
+
+ # Get max allowed token length from config
+ max_allowed_length = self.config.max_token_length
+ # Set threshold at 50% of max_token_length - no penalty below this
+ length_threshold = max_allowed_length * 0.5
+
+ # Apply modified length penalty with threshold
+ scores["scores"] = []
+ for length in token_lengths:
+ if length <= length_threshold:
+ # No penalty for responses under threshold
+ scores["scores"].append(1.0)
+ else:
+ # Calculate how far we are between threshold and max as a percentage
+ percentage_of_range = (length - length_threshold) / (
+ max_allowed_length - length_threshold
+ )
+ # Cap at 1.0 in case length exceeds max_allowed_length
+ percentage_of_range = min(percentage_of_range, 1.0)
+ # Apply linear penalty scaling from 1.0 down to 0.0
+ scores["scores"].append(1.0 - percentage_of_range)
+ if all([scores["scores"][0] == score for score in scores["scores"]]):
+ return None # If all the same, we return None
+ return scores
+ else:
+ # If the gold solution is not parseable, we return None
+ return None
+
+ async def get_next_item(self) -> SolitaireRow:
+ if not self.train:
+ # Handle case where training data might be empty
+ # This could involve raising an error or returning a default item
+ raise ValueError("Training data is empty. Cannot get next item.")
+ next_item_index = self.iter % len(self.train)
+ next_item = self.train[next_item_index]
+ self.iter += 1
+ # Ensure the returned item conforms to GSM8kRow structure if other parts of the code expect it
+ # The current loading logic for self.train directly creates dicts with "question" and "answer"
+ return next_item
+
+
+if __name__ == "__main__":
+ import sys
+
+ # Note: Set your OpenAI API key via environment variable OPENAI_API_KEY
+ # or configure it in your server_configs
+
+ if len(sys.argv) == 1 or (
+ len(sys.argv) > 1 and sys.argv[1] not in ["serve", "process"]
+ ):
+ # If no command is specified, or the first arg is not 'serve' or 'process',
+ # default to the 'process' command.
+ # All other arguments will be passed to the 'process' command.
+ sys.argv = [sys.argv[0], "process"] + sys.argv[1:]
+ SolitaireEnv.cli()