atropos/environments/community/lean_proof_env/README.md

## Overview
The `LeanEnv` class implements an environment specifically designed for training and evaluating language models on Lean theorem proving tasks. Its primary motivation is to facilitate research in automated reasoning and to develop LLMs capable of formal mathematical deduction. The environment orchestrates the interaction between an LLM, which generates proof attempts for given theorem statements, and a Lean code checker (currently mocked as `PyPantograph`), which validates these proofs. This interaction provides the necessary feedback signal (rewards) for reinforcement learning. The environment utilizes the `brando/minif2f-lean4` dataset from Hugging Face, providing a standardized benchmark for Lean theorem proving.

## Quickstart / Usage

The `lean_env.py` script is designed to be run directly from the command line, leveraging the `BaseEnv.cli()` method for easy execution.

### 1. Serve Mode (Connecting to Atropos Trainer)

To run the environment as a server that the Atropos trainer can connect to for distributed training or inference:
```bash
python environments/lean_env.py serve
```
This command will start the environment, making it available for the Atropos training infrastructure. Ensure your LLM API server (specified in `APIServerConfig`) is running and accessible.

### 2. Process Mode (Local Testing and Data Generation)

To run the environment locally for testing, debugging, data generation, or observing behavior with WandB (if enabled):
```bash
python environments/lean_env.py process
```
This mode is useful for:
- Verifying that the dataset loads correctly.
- Testing the proof generation and scoring logic with a small number of examples.
- Generating a sample WandB run to inspect logged metrics and tables (ensure `use_wandb=True` in `BaseEnvConfig`).
- Running a full evaluation pass locally if configured.

Additional arguments or configurations might be available via the CLI; refer to the `BaseEnv.cli()` implementation for more details if needed.

## WandB Logging and Metrics

If `use_wandb` is enabled in the configuration, the environment logs detailed metrics and proof attempts to Weights & Biases, providing insights into the training and evaluation process.

### Example WandB Run

You can find an example of a public WandB run generated by executing the `process` command here:
`[Link to your public WandB run]` (Replace this with an actual link after running the environment).

### Custom Metrics

In addition to the standard metrics logged by the `BaseEnv`, the `LeanEnv` adds the following custom metrics:

*   **`train/batch_avg_percent_compiled`**: (Logged during training steps) This metric shows the moving average percentage of successfully compiled Lean proofs within recent training batches. It provides a signal of how well the model is learning to generate valid proofs during the training process.
*   **`eval/percent_compiled`**: (Logged during evaluation steps) This metric reports the percentage of Lean problems from the test set for which the LLM generated a successfully compiled proof. It is a key indicator of the model's generalization performance on unseen problems.
*   **`train/lean_proof_attempts` (wandb.Table)**: (Logged during training steps, if `num_rollouts_to_keep > 0`) This is a `wandb.Table` that provides a detailed log of individual proof attempts. Each row in the table corresponds to one attempt and includes:
    *   **Problem Statement**: The original Lean theorem statement (with `sorry`).
    *   **Generated Proof**: The proof block generated by the LLM.
    *   **Score**: The reward assigned to the proof attempt (1.0 for compilation, -1.0 for failure).
    *   **Compilation Status**: A message indicating whether the proof compiled successfully or the error encountered.

These metrics are crucial for understanding the model's progress in learning to generate correct and efficient Lean proofs. The detailed table of attempts allows for qualitative analysis of the model's outputs.