fix(training): Prepend <think> token in format reward (#396)

* prepend think token in format reward

* pre commit + fix some default vals

* add checkpoint config
This commit is contained in:
Zafir Stojanovski 2025-03-28 09:45:17 +01:00 committed by GitHub
parent 7ae2942c34
commit c6663cdb81
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 16 additions and 1 deletions

5
.gitignore vendored
View file

@ -45,3 +45,8 @@ htmlcov/
# Jupyter Notebook
.ipynb_checkpoints/
.virtual_documents/
# logs
wandb/
outputs/
*.log