InternBootcamp/examples/pipelines
chenyongkang 324d068f8d refactor(internbootcamp): standardize naming conventions and improve code structure
- Rename puzzle configuration files for consistency (e.g., InChI2logP_test.json)
- Standardize class names to PascalCase (e.g., InChI2MRBootCamp -> InChI2MRbootCamp)
- Improve code structure in various bootcamp modules for better readability and maintainability
- Update import statements and file references to reflect new naming conventions
- Enhance setup.py to include rdkit dependency
2025-06-16 20:49:17 +08:00
..
autogen_configs fix-autogen-configs 2025-05-30 14:05:33 +08:00
data_configs fix symbolic 2025-06-16 20:31:37 +08:00
puzzle_configs refactor(internbootcamp): standardize naming conventions and improve code structure 2025-06-16 20:49:17 +08:00
cipher_data_generator.py pjli-dev 2025-06-12 12:45:31 +08:00
data_generator.py fix bugs for symbolic regression bootcamp 2025-06-16 17:31:52 +08:00
quickgen_data_configs.py add medical 2025-06-16 10:33:07 +08:00
README.md init-commit 2025-05-23 15:27:15 +08:00
README_zh.md init-commit 2025-05-23 15:27:15 +08:00
run_pipeline.sh fix bugs for symbolic regression bootcamp 2025-06-16 17:31:52 +08:00

Pipeline Usage

Configuration files

puzzle_configs: you can configure the parameters for __init__ a bootcamp. Different parameters lead to different distribution of the generated samples.

data_configs: configuration files to run the final generation pipeline.

  • You can manually add the tasks you want to generate data for in the file.
  • You can use examples/pipelines/puzzle_configs/ to run examples/pipelines/data_config_gen.py. This will automatically generate data_config_train.jsonl and data_config_test.jsonl under data_configs.

For example, an example to include futoshiki is as follows.

{"bootcamp_name": "futoshiki", "sample_number": 100, "config_file": "futoshiki", "bootcamp_cls_name": "Futoshikibootcamp"}

Here, sample_number means the number of data samples to generate, config_file the name of the task configuration file, and bootcamp_cls_name represent the class name of the bootcamp used to generate data.

Running the Data Generation Pipeline

run_pipeline.sh contains the unified pipeline to generate data for all tasks based on the configurations.

Quick Start

  1. Run the following command to gather all the bootcamp into a configuration file to specify options for data generation..
python examples/pipelines/quickgen_data_configs.py

You can adjust the train_sample_number and test_sample_number to control the number to samples to generate for the two sets.

  1. Run bash examples/pipelines/run_pipline.sh to generate data with the output under examples/bootcamp_generator_outputs.