docs(README): add quick start guide and update related documentation

- Add quick start guide for InternBootcamp in both English and Chinese
- Update README and README_zh to include new quick start links
- Create detailed documentation for using Xtuner with Bootcamp data
This commit is contained in:
chenyongkang 2025-06-12 21:26:49 +08:00
parent 467bf9359b
commit d2b7ff6d38
5 changed files with 208 additions and 0 deletions

74
examples/get_started.md Normal file
View file

@ -0,0 +1,74 @@
# Quick Start
**InternBootcamp** provides functionalities such as data generation, model training, and model evaluation. You can refer to the following guide to get started quickly.
To ensure successful execution of the following steps, please make sure **InternBootcamp is installed**, and **the project root directory is set as your working directory**.
## Data Generation
Run [**run\_pipeline.sh**](examples/pipelines/run_pipeline.sh) to generate training and testing data based on the [default configuration](examples/pipelines/data_configs).
For custom configurations, please refer to the [Pipeline Usage Guide](examples/pipelines/README.md) for personalized setup.
```bash
source examples/pipelines/run_pipeline.sh
```
The generated data will be saved in the [bootcamp\_generator\_outputs directory](examples/bootcamp_generator_outputs). Each data batch is timestamped, and the directory structure is as follows:
```
examples/
├── ...
└──bootcamp_generator_outputs/
├── ...
└── 2025-xx-xx-xx:xx:xx/
├── test/
│ ├── bootcamp_0.jsonl
│ ├── ...
│ └── bootcamp_n.jsonl
└── train/
├── bootcamp_0.jsonl
├── ...
└── bootcamp_n.jsonl
```
## Model Training(Reinforcement learning)
We provide support for two training frameworks: **Xpuyu** and **Verl**.
### Xpuyu
Refer to the [Xpuyu documentation](examples/xpuyu_usage/README.md) to get started with efficient training.
### Verl
To integrate Bootcamp tasks into the Verl framework for training, you need to embed the Bootcamp reward computation method into Verl.
See the [Verl documentation](examples/verl_usage/README.md) for detailed guidance.
## Model Evaluation
For Bootcamp tasks, we offer a customized evaluation service.
Once the model to be evaluated is deployed using frameworks like FastChat or Ollama, and you have the corresponding API URL and API Key, you can run the following command to evaluate your deployed model on the **InternBootcamp\_eval** dataset:
```bash
cd InternBootcamp
python examples/unittests/run_eval.py \
--url http://127.0.0.1:8000/v1 \
--api_key EMPTY \
--model_name r1_32B \
--api_mode completion \
--template r1 \
--max_tokens 32768 \
--temperature 0 \
--test_dir examples/data/InternBootcamp_eval \
--max_concurrent_requests 128 \
--timeout 6000 \
--max_retries 16 \
--max_retrying_delay 60
```
**Note:** When `api_mode` is set to `completion`, be sure to correctly specify the corresponding template (supported: `r1`, `qwen`, `internthinker`, `chatml` (with no system prompt)).
For more detailed instructions, refer to the [Evaluation Manual](examples/unittests/README.md).
---
Let me know if youd like a version with clearer formatting for publishing or documentation.