mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-22 16:48:57 +00:00
Integrate aniemerg wikipedia (#143)
* initial commit * initial draft of wikipedia article creation environment * add openai for rollouts, update requirements, create script to run, etc. * add configuration, add debugging, fix tool calls, prevent wikipedia access * now creates html file * fix output for html page * check in Claude plan * fixed formatting and other issues * add zip file * update README * linting, moved to community folder * linting * linting * linting * linting --------- Co-authored-by: Allan Niemerg <niemerg@gmail.com>
This commit is contained in:
parent
b774e97215
commit
f21154ff49
14 changed files with 4480 additions and 0 deletions
|
|
@ -0,0 +1,282 @@
|
|||
# Wikipedia Article Creator: OpenAI Integration and Environment Setup Plan
|
||||
|
||||
## 1. Overview
|
||||
|
||||
This document outlines the plan for integrating OpenAI models and proper environment configuration with the existing WikipediaArticleCreatorEnv implementation. The goal is to create a robust setup that can leverage OpenAI's models (like GPT-4, GPT-3.5-turbo) while maintaining a secure configuration for API credentials through environment variables.
|
||||
|
||||
## 2. Key Components and Changes
|
||||
|
||||
### 2.1 Environment Variables (.env) Setup
|
||||
|
||||
We need to properly load and manage API credentials from a .env file:
|
||||
|
||||
```python
|
||||
# At the top of wikipedia_article_creator.py
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Load environment variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
# Later, access environment variables
|
||||
openai_api_key = os.environ.get("OPENAI_API_KEY")
|
||||
tavily_api_key = os.environ.get("TAVILY_API_KEY")
|
||||
```
|
||||
|
||||
### 2.2 OpenAI API Integration
|
||||
|
||||
Update the `config_init` method in the `WikipediaArticleCreatorEnv` class to support OpenAI models:
|
||||
|
||||
```python
|
||||
@classmethod
|
||||
def config_init(cls) -> Tuple[WikipediaArticleCreatorConfig, List[APIServerConfig]]:
|
||||
env_config = WikipediaArticleCreatorConfig(
|
||||
tokenizer_name="gpt-4-turbo", # Use any OpenAI model name or a different tokenizer for local processing
|
||||
group_size=4,
|
||||
use_wandb=True,
|
||||
rollout_server_url="http://localhost:8000",
|
||||
total_steps=1000,
|
||||
batch_size=128,
|
||||
steps_per_eval=20,
|
||||
max_token_length=1024 * 16,
|
||||
inference_weight=1.0,
|
||||
wandb_name="wikipedia_article_creator",
|
||||
eval_handling=EvalHandlingEnum.LIMIT_TRAIN,
|
||||
eval_limit_ratio=0.1,
|
||||
max_steps=10,
|
||||
temperature=0.7,
|
||||
thinking_active=True,
|
||||
eval_topics=5,
|
||||
tool_timeout=15.0,
|
||||
tavily_api_key=None, # Falls back to environment variable
|
||||
min_article_sections=3,
|
||||
max_article_tokens=2048,
|
||||
topics_file="topics.json",
|
||||
logging_active=True,
|
||||
)
|
||||
|
||||
# Configure OpenAI server
|
||||
server_configs = [
|
||||
APIServerConfig(
|
||||
model_name="gpt-4o", # or any other OpenAI model
|
||||
base_url=None, # Use default OpenAI base URL
|
||||
api_key=os.environ.get("OPENAI_API_KEY"),
|
||||
num_max_requests_at_once=4,
|
||||
num_requests_for_eval=16,
|
||||
),
|
||||
]
|
||||
|
||||
return env_config, server_configs
|
||||
```
|
||||
|
||||
### 2.3 OpenAI Token Length Handling
|
||||
|
||||
Since OpenAI models handle token length limits differently, we need to make some adjustments to ensure proper tokenization and context management:
|
||||
|
||||
```python
|
||||
async def _get_model_response(self, messages: List[Dict]) -> str:
|
||||
"""Get a response from the model for the current conversation state"""
|
||||
try:
|
||||
# For OpenAI models, we pass the messages directly
|
||||
if self.server.config.base_url is None or 'openai' in (self.server.config.base_url or ''):
|
||||
completion = await self.server.chat_completion(
|
||||
messages=messages,
|
||||
temperature=self.config.temperature,
|
||||
max_tokens=min(4096, self.config.max_token_length), # Ensure within OpenAI limits
|
||||
)
|
||||
# Extract the text from response
|
||||
return completion.choices[0].message.content
|
||||
else:
|
||||
# For non-OpenAI models (e.g., local models via vLLM)
|
||||
prompt = self.tokenizer.apply_chat_template(messages, tokenize=False)
|
||||
completion = await self.server.completion(
|
||||
prompt=prompt,
|
||||
n=1,
|
||||
max_tokens=self.config.max_token_length,
|
||||
temperature=self.config.temperature,
|
||||
)
|
||||
return completion.choices[0].text
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting model response: {e}")
|
||||
return ""
|
||||
```
|
||||
|
||||
### 2.4 Testing Runner Script
|
||||
|
||||
Create a simple runner script to test the environment with OpenAI models:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test runner for Wikipedia Article Creator with OpenAI models
|
||||
"""
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from environments.hack0.wikipedia.wikipedia_article_creator import WikipediaArticleCreatorEnv
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
async def main():
|
||||
"""Run the environment with a specified model and topic"""
|
||||
# Create and initialize the environment
|
||||
env_config, server_configs = WikipediaArticleCreatorEnv.config_init()
|
||||
|
||||
# Override with command-line arguments if needed
|
||||
# env_config.max_steps = 5
|
||||
# server_configs[0].model_name = "gpt-3.5-turbo"
|
||||
|
||||
# Initialize the environment
|
||||
env = WikipediaArticleCreatorEnv(env_config, server_configs, slurm=False, testing=True)
|
||||
await env.setup()
|
||||
|
||||
# Run a single episode with a specific topic
|
||||
topic = "Climate change in Antarctica"
|
||||
episode_id = 1
|
||||
|
||||
logger.info(f"Starting research on topic: {topic}")
|
||||
episode = env._get_or_create_episode(episode_id, topic)
|
||||
|
||||
# Run until terminal state
|
||||
while not episode.is_terminal:
|
||||
is_terminal, step_data = await env._next_step(episode)
|
||||
response = step_data.get("response", "")
|
||||
tool_calls = step_data.get("tool_calls", [])
|
||||
|
||||
logger.info(f"Step {episode.steps_taken} completed")
|
||||
logger.info(f"Tool calls: {len(tool_calls)}")
|
||||
|
||||
if is_terminal and episode.final_article:
|
||||
logger.info("Article generated successfully")
|
||||
logger.info(f"Article length: {len(episode.final_article)} characters")
|
||||
|
||||
# Save the article to a file
|
||||
with open(f"article_{topic.replace(' ', '_')}.md", "w") as f:
|
||||
f.write(episode.final_article)
|
||||
|
||||
# Evaluate article quality
|
||||
quality_metrics = env._assess_article_quality(
|
||||
episode.final_article, episode.research_facts
|
||||
)
|
||||
logger.info(f"Article quality metrics: {quality_metrics}")
|
||||
elif is_terminal:
|
||||
logger.info("Episode terminated without article")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## 3. Required Environment Setup
|
||||
|
||||
### 3.1 .env File Structure
|
||||
|
||||
Create a `.env` file in the project root with the necessary API keys:
|
||||
|
||||
```
|
||||
# OpenAI API Key for model access
|
||||
OPENAI_API_KEY=sk-...your-openai-key...
|
||||
|
||||
# Tavily API Key for web search and content extraction
|
||||
TAVILY_API_KEY=tvly-...your-tavily-key...
|
||||
|
||||
# Optional WandB configuration
|
||||
WANDB_API_KEY=...your-wandb-key...
|
||||
WANDB_PROJECT=wikipedia_article_creator
|
||||
```
|
||||
|
||||
### 3.2 Dependencies
|
||||
|
||||
Ensure the following packages are installed:
|
||||
|
||||
```
|
||||
python-dotenv>=1.0.0
|
||||
openai>=1.10.0
|
||||
tavily-python>=0.2.2
|
||||
wandb>=0.16.0
|
||||
```
|
||||
|
||||
## 4. Implementation Plan
|
||||
|
||||
### 4.1 Phase 1: Environment Setup and .env Integration
|
||||
|
||||
1. Add the python-dotenv package to project dependencies
|
||||
2. Create a `.env.template` file with placeholder values
|
||||
3. Update README to explain the required API keys and environment setup
|
||||
4. Modify `wikipedia_article_creator.py` to load environment variables
|
||||
|
||||
### 4.2 Phase 2: OpenAI Integration
|
||||
|
||||
1. Update the `config_init` method to use OpenAI models
|
||||
2. Modify the `_get_model_response` method to handle OpenAI models properly
|
||||
3. Adjust tokenization and context length handling for OpenAI models
|
||||
4. Test with smaller models (GPT-3.5-turbo) before moving to GPT-4
|
||||
|
||||
### 4.3 Phase 3: Testing and Validation
|
||||
|
||||
1. Create a test runner script to validate the environment with OpenAI models
|
||||
2. Test with a variety of topics to ensure the environment works correctly
|
||||
3. Monitor API usage and optimize requests to minimize token usage
|
||||
4. Validate article quality with different models and parameters
|
||||
|
||||
### 4.4 Phase 4: Performance Optimization
|
||||
|
||||
1. Implement caching for API responses to reduce duplicate requests
|
||||
2. Optimize API call batching for evaluation runs
|
||||
3. Add error handling and retry logic for API rate limits
|
||||
4. Document best practices for working with OpenAI models in this environment
|
||||
|
||||
## 5. Potential Challenges and Solutions
|
||||
|
||||
### 5.1 API Rate Limits
|
||||
|
||||
**Challenge:** OpenAI API has rate limits that could affect research speed.
|
||||
**Solution:** Implement exponential backoff retry logic and request batching.
|
||||
|
||||
### 5.2 Token Context Length
|
||||
|
||||
**Challenge:** Different OpenAI models have different context length limits.
|
||||
**Solution:** Implement model-specific context management to stay within limits.
|
||||
|
||||
### 5.3 Cost Management
|
||||
|
||||
**Challenge:** API usage costs can accumulate quickly with multiple research steps.
|
||||
**Solution:** Implement cost tracking, efficient caching, and configurable limits.
|
||||
|
||||
### 5.4 Model Differences
|
||||
|
||||
**Challenge:** Different models may have different capabilities and response formats.
|
||||
**Solution:** Add model-specific parsing and prompt adjustments.
|
||||
|
||||
## 6. Testing Metrics
|
||||
|
||||
1. **Research Efficiency:**
|
||||
- Average tool calls per completed article
|
||||
- Percentage of relevant vs. irrelevant searches
|
||||
- Time to completion
|
||||
|
||||
2. **Article Quality:**
|
||||
- Structure compliance (section count, references)
|
||||
- Content relevance to topic
|
||||
- Factual accuracy
|
||||
- Information density
|
||||
|
||||
3. **API Usage:**
|
||||
- Total tokens per article
|
||||
- Cost per article
|
||||
- Rate limit errors encountered
|
||||
|
||||
## 7. Next Steps
|
||||
|
||||
After successful implementation of OpenAI integration and environment setup:
|
||||
|
||||
1. Optimize prompts specifically for OpenAI models
|
||||
2. Implement comparative evaluation across different models
|
||||
3. Add support for multi-model fallback (e.g., start with GPT-3.5, escalate to GPT-4 for complex topics)
|
||||
4. Create a comprehensive benchmark suite for Wikipedia article creation
|
||||
Loading…
Add table
Add a link
Reference in a new issue