InternBootcamp/examples/get_started.md
chenyongkang d2b7ff6d38 docs(README): add quick start guide and update related documentation
- Add quick start guide for InternBootcamp in both English and Chinese
- Update README and README_zh to include new quick start links
- Create detailed documentation for using Xtuner with Bootcamp data
2025-06-12 21:26:49 +08:00

2.8 KiB
Raw Blame History

Quick Start

InternBootcamp provides functionalities such as data generation, model training, and model evaluation. You can refer to the following guide to get started quickly.

To ensure successful execution of the following steps, please make sure InternBootcamp is installed, and the project root directory is set as your working directory.

Data Generation

Run run_pipeline.sh to generate training and testing data based on the default configuration. For custom configurations, please refer to the Pipeline Usage Guide for personalized setup.

source examples/pipelines/run_pipeline.sh

The generated data will be saved in the bootcamp_generator_outputs directory. Each data batch is timestamped, and the directory structure is as follows:

examples/
├── ...
└──bootcamp_generator_outputs/
    ├── ...
    └── 2025-xx-xx-xx:xx:xx/
        ├── test/
        │   ├── bootcamp_0.jsonl
        │   ├── ...
        │   └── bootcamp_n.jsonl
        └── train/
            ├── bootcamp_0.jsonl
            ├── ...
            └── bootcamp_n.jsonl

Model Training(Reinforcement learning)

We provide support for two training frameworks: Xpuyu and Verl.

Xpuyu

Refer to the Xpuyu documentation to get started with efficient training.

Verl

To integrate Bootcamp tasks into the Verl framework for training, you need to embed the Bootcamp reward computation method into Verl. See the Verl documentation for detailed guidance.

Model Evaluation

For Bootcamp tasks, we offer a customized evaluation service. Once the model to be evaluated is deployed using frameworks like FastChat or Ollama, and you have the corresponding API URL and API Key, you can run the following command to evaluate your deployed model on the InternBootcamp_eval dataset:

cd InternBootcamp
python examples/unittests/run_eval.py \
    --url http://127.0.0.1:8000/v1 \
    --api_key EMPTY \
    --model_name r1_32B \
    --api_mode completion \
    --template r1 \
    --max_tokens 32768 \
    --temperature 0 \
    --test_dir examples/data/InternBootcamp_eval \
    --max_concurrent_requests 128 \
    --timeout 6000 \
    --max_retries 16 \
    --max_retrying_delay 60

Note: When api_mode is set to completion, be sure to correctly specify the corresponding template (supported: r1, qwen, internthinker, chatml (with no system prompt)). For more detailed instructions, refer to the Evaluation Manual.


Let me know if youd like a version with clearer formatting for publishing or documentation.