mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Integrate subrahmanyam cybersecurity (#142)
* cybersecurity env for offline RL trajectories * output file addition * jsonl outputs * code cleanup * pulled out outputs and fixing .gitignore * removed zip file * gitignore typo fix * Integrate cybersecurity Sigma rule generation environment --------- Co-authored-by: Subrahmanyam Arunachalam <subrahmanyam.arunachalam@FVFGK0VTQ05P.local>
This commit is contained in:
parent
b33070f56b
commit
b774e97215
4 changed files with 890 additions and 0 deletions
139
environments/community/cybersecurity_sigma/README.md
Normal file
139
environments/community/cybersecurity_sigma/README.md
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
# Cybersecurity Sigma Rule Generation Environment
|
||||
|
||||
This environment trains LLMs to generate semantically correct Sigma detection rules from threat-hunting prompts. It provides two different reward mechanisms for evaluating generated rules.
|
||||
|
||||
## Overview
|
||||
|
||||
The environment focuses on structured generation tasks where outputs must be valid YAML conforming to Sigma detection rule schemas. It includes two implementations with different reward functions:
|
||||
|
||||
1. **Jaccard Similarity Reward** (`jaccard_reward_env.py`) - Uses token-based similarity scoring
|
||||
2. **LLM Judge Reward** (`llm_judge_env.py`) - Uses LLM-based semantic evaluation
|
||||
|
||||
## Core Features
|
||||
|
||||
### Dataset Integration
|
||||
- Uses the `mmaisel1/nous-rl-hackathon-sigma` dataset from Hugging Face
|
||||
- Contains threat-hunting prompts paired with corresponding Sigma rules
|
||||
- Automatic train/test split with shuffling for reproducibility
|
||||
|
||||
### Structured Output Format
|
||||
- Enforces specific output format with `<think>...</think>` reasoning tags
|
||||
- Requires YAML output wrapped in LaTeX `\boxed{...}` environment
|
||||
- Validates YAML syntax and Sigma rule structure
|
||||
|
||||
### Dual Reward Mechanisms
|
||||
|
||||
#### Jaccard Similarity Scoring
|
||||
- Compares flattened key paths of gold and generated YAML under `detection:` section
|
||||
- Uses scikit-learn's Jaccard similarity for token-based matching
|
||||
- Tends to produce low and sparse rewards due to structural mismatches
|
||||
|
||||
#### LLM-as-a-Judge Scoring
|
||||
- Uses binary LLM evaluation for semantic equivalence assessment
|
||||
- Returns 1.0 if generated rule is functionally equivalent to gold standard
|
||||
- Provides higher-fidelity supervision even when structure varies
|
||||
|
||||
### Advanced Features
|
||||
- Length penalty system for overly verbose outputs
|
||||
- Comprehensive evaluation metrics tracking
|
||||
- W&B integration for experiment monitoring
|
||||
- Configurable token limits and batch sizes
|
||||
|
||||
## Technical Implementation
|
||||
|
||||
### Environment Configuration
|
||||
- **Model**: NousResearch/DeepHermes-3-Llama-3-3B-Preview
|
||||
- **Max Token Length**: 2048 tokens
|
||||
- **Group Size**: 8 completions per prompt
|
||||
- **Batch Size**: 12 items per batch
|
||||
- **Evaluation Frequency**: Every 100 steps
|
||||
|
||||
### System Prompt Structure
|
||||
The environment uses a detailed system prompt that:
|
||||
- Enforces structured reasoning with `<think>` tags
|
||||
- Requires YAML output in `\boxed{}` environment
|
||||
- Provides Sigma rule best practices and examples
|
||||
- Specifies exact formatting requirements for parser compatibility
|
||||
|
||||
### Scoring Pipeline
|
||||
1. **Extraction**: Parse YAML from `\boxed{}` wrapper using regex
|
||||
2. **Validation**: Attempt YAML parsing and structure validation
|
||||
3. **Evaluation**: Apply either Jaccard similarity or LLM judge scoring
|
||||
4. **Aggregation**: Collect scores for batch-level reward computation
|
||||
|
||||
## Setup and Usage
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
export OPENAI_API_KEY="your-openai-api-key" # For LLM judge (optional)
|
||||
export NOUS_API_KEY="your-nous-api-key" # For model inference
|
||||
```
|
||||
|
||||
### Command Line Usage
|
||||
```bash
|
||||
# Jaccard similarity reward
|
||||
python environments/community/cybersecurity_sigma/jaccard_reward_env.py
|
||||
|
||||
# LLM judge reward
|
||||
python environments/community/cybersecurity_sigma/llm_judge_env.py
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
- `datasets` - Hugging Face dataset loading
|
||||
- `scikit-learn` - Jaccard similarity computation (jaccard_reward_env only)
|
||||
- `latex2sympy2_extended` - LaTeX parsing utilities
|
||||
- `math_verify` - YAML extraction from LaTeX boxes
|
||||
- `openai` - LLM judge API calls (llm_judge_env only)
|
||||
|
||||
## Research Applications
|
||||
|
||||
### Cybersecurity Training
|
||||
- Train models to understand threat detection patterns
|
||||
- Generate rules for various attack vectors and techniques
|
||||
- Develop automated threat hunting capabilities
|
||||
|
||||
### Structured Generation Research
|
||||
- Study LLM performance on constrained output formats
|
||||
- Compare token-based vs. semantic evaluation methods
|
||||
- Investigate reasoning quality in cybersecurity domains
|
||||
|
||||
### Evaluation Methodology Development
|
||||
- Benchmark different reward function approaches
|
||||
- Analyze correlation between structural and semantic correctness
|
||||
- Develop better automated evaluation metrics for domain-specific tasks
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Jaccard Similarity Results
|
||||
- **Typical Rewards**: 0.1-0.3 range due to structural sensitivity
|
||||
- **Strengths**: Fast computation, deterministic scoring
|
||||
- **Limitations**: Sensitive to formatting differences, low reward density
|
||||
|
||||
### LLM Judge Results
|
||||
- **Typical Rewards**: Binary 0.0/1.0 with higher success rates
|
||||
- **Strengths**: Semantic understanding, format flexibility
|
||||
- **Limitations**: API latency, potential inconsistency, cost considerations
|
||||
|
||||
## Example Outputs
|
||||
|
||||
### Input Prompt
|
||||
```
|
||||
DotNET Assembly DLL Loaded Via Office Application: Detects any assembly DLL being loaded by an Office Product
|
||||
```
|
||||
|
||||
### Expected Sigma Rule Format
|
||||
```yaml
|
||||
detection:
|
||||
condition: selection
|
||||
selection:
|
||||
process_name:
|
||||
- excel.exe
|
||||
- word.exe
|
||||
- powerpnt.exe
|
||||
dll_loaded: "*.dll"
|
||||
logsource:
|
||||
category: process
|
||||
product: windows
|
||||
```
|
||||
|
||||
The environment provides a robust framework for training LLMs on cybersecurity detection rule generation with flexible evaluation mechanisms suited for different research objectives.
|
||||
Loading…
Add table
Add a link
Reference in a new issue