Zafir Stojanovski
|
7cbd3dbed6
|
duplicate answer word
|
2025-02-15 15:38:56 +01:00 |
|
Zafir Stojanovski
|
326f6051bc
|
update system prompt to tell model to follow 1-shot example format
|
2025-02-15 15:34:30 +01:00 |
|
Andreas Koepf
|
fe231b4b31
|
incorporate prompt changes suggested by Miserlou
|
2025-02-14 15:44:00 +01:00 |
|
Andreas Koepf
|
0a660a3409
|
ignore single whitespace at beginning and end of answer, use reward = len(oracle_answer) / len(answer)
|
2025-02-14 15:40:12 +01:00 |
|
Zafir Stojanovski
|
3d84816f95
|
system prompt for structured output, and parse such outputs
|
2025-02-12 10:44:42 +01:00 |
|
Andreas Koepf
|
afb95508ef
|
gsm_symbolic generator changes
|
2025-02-05 20:58:01 +01:00 |
|
Andreas Koepf
|
c196d622e0
|
extract answer from last answer tag
|
2025-01-28 16:37:19 +00:00 |
|
Andreas Koepf
|
cc0312e446
|
add first example with OpenRLHF
|
2025-01-28 14:40:06 +00:00 |
|