mirror of
https://github.com/InternLM/InternBootcamp.git
synced 2026-04-19 12:58:04 +00:00
1.6 KiB
1.6 KiB
Pipeline Usage
Configuration files
puzzle_configs: you can configure the parameters for __init__ a bootcamp. Different parameters lead to different distribution of the generated samples.
data_configs: configuration files to run the final generation pipeline.
- You can manually add the tasks you want to generate data for in the file.
- You can use
examples/pipelines/puzzle_configs/to runexamples/pipelines/data_config_gen.py. This will automatically generate data_config_train.jsonl and data_config_test.jsonl underdata_configs.
For example, an example to include futoshiki is as follows.
{"bootcamp_name": "futoshiki", "sample_number": 100, "config_file": "futoshiki", "bootcamp_cls_name": "Futoshikibootcamp"}
Here, sample_number means the number of data samples to generate, config_file the name of the task configuration file, and bootcamp_cls_name represent the class name of the bootcamp used to generate data.
Running the Data Generation Pipeline
run_pipeline.sh contains the unified pipeline to generate data for all tasks based on the configurations.
Quick Start
- Run the following command to gather all the bootcamp into a configuration file to specify options for data generation..
python examples/pipelines/quickgen_data_configs.py
You can adjust the train_sample_number and test_sample_number to control the number to samples to generate for the two sets.
- Run
bash examples/pipelines/run_pipline.shto generate data with the output underexamples/bootcamp_generator_outputs.