Integrate subrahmanyam cybersecurity (#142)

* cybersecurity env for offline RL trajectories

* output file addition

* jsonl outputs

* code cleanup

* pulled out outputs and fixing .gitignore

* removed zip file

* gitignore typo fix

* Integrate cybersecurity Sigma rule generation environment

---------

Co-authored-by: Subrahmanyam Arunachalam <subrahmanyam.arunachalam@FVFGK0VTQ05P.local>
This commit is contained in:
shannonsands 2025-05-28 08:41:51 +10:00 committed by GitHub
parent b33070f56b
commit b774e97215
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 890 additions and 0 deletions

View file

@ -419,3 +419,46 @@ The environments follow a common interface with methods for:
- `score()`: Computing rewards
- `evaluate()`: Running evaluation on test set
- `wandb_log()`: Logging metrics to Weights & Biases
## 31. Cybersecurity Sigma Rule Generation Environment
**Location:** `environments/community/cybersecurity_sigma/`
**Contributor:** [Subrahmanyam2305](https://github.com/Subrahmanyam2305)
**PR:** [#74](https://github.com/NousResearch/atropos/pull/74)
### Core Features
- **Dual Reward Systems**: Jaccard similarity scoring and LLM-as-a-judge evaluation
- **Structured Output Generation**: Enforces YAML format with LaTeX `\boxed{}` wrapper
- **Cybersecurity Domain**: Trains models to generate Sigma detection rules from threat prompts
- **Dataset Integration**: Uses `mmaisel1/nous-rl-hackathon-sigma` from Hugging Face
### Technical Implementation
- **Environment Names**: `sigmarule` (Jaccard) and `llm_judge_sigmarule` (LLM judge)
- **Output Format**: `<think>...</think>` reasoning tags + YAML in `\boxed{}`
- **Reward Mechanisms**: Token-based Jaccard similarity vs. semantic LLM evaluation
- **Model Configuration**: DeepHermes-3-Llama-3-3B-Preview with 2048 token limit
### Research Applications
- **Cybersecurity Training**: Automated threat detection rule generation
- **Structured Generation**: Constrained output format research with YAML validation
- **Evaluation Methodology**: Comparison of token-based vs. semantic reward functions
- **Domain Expertise**: Training models on specialized cybersecurity knowledge
### Setup and Usage
```bash
# Environment variables
export OPENAI_API_KEY="your-key" # For LLM judge (optional)
export NOUS_API_KEY="your-key" # For model inference
# Run environments
python environments/community/cybersecurity_sigma/jaccard_reward_env.py
python environments/community/cybersecurity_sigma/llm_judge_env.py
```
### Performance Characteristics
- **Jaccard Rewards**: 0.1-0.3 range, fast but structurally sensitive
- **LLM Judge Rewards**: Binary 0.0/1.0, semantic understanding but API latency
- **W&B Integration**: Comprehensive experiment tracking and visualization
- **Length Penalties**: Applied for overly verbose rule generation
---