feat(run_eval): add checkpoint resume functionality and update example documentation;

- update new bootcamp benchmark dataset
2026-04-29 17:35:14 +00:00 · 2025-08-26 16:50:52 +08:00 · 2025-08-26 16:50:52 +08:00 · 1a8477c8d8
commit 1a8477c8d8
parent 125a7818e0
166 changed files with 8877 additions and 5047 deletions
--- a/examples/unittests/README_zh.md
+++ b/examples/unittests/README_zh.md
@ -22,7 +22,8 @@ python examples/unittests/run_eval.py \
    --timeout 6000 \
    --api_mode completion \
    --max_retries 16 \
-    --max_retrying_delay 60
+    --max_retrying_delay 60 \
+    --resume
 ```

 ---
@ -46,6 +47,8 @@ python examples/unittests/run_eval.py \
 | `--sys_prompt`          | str        | `"You are an expert reasoner..."`        | 系统提示内容，仅在 `api_mode` 为 `chat_completion` 时生效。          |
 | `--max_retries`         | int        | `16`                                     | 单个请求失败重试次数。                                              |
 | `--max_retrying_delay`  | int        | `60`                                     | 最大重试延迟时间（秒）。                           |
+| `--resume`              | bool        | `true`                                     | 是否从上次中断的位置继续执行。                                     |
+| `--check_model_url`     | bool        | `true`                                     | 在开始评测前检查模型服务的 URL 是否可用。                             |

 ##### 参数关系
 - `--api_mode`为`chat_completion`时，`--sys_prompt`参数才有效。