AhmedSaif2
|
75cbfb8783
|
fix parameter name in compute_decimal_reward docstring
|
2025-02-21 17:01:59 +02:00 |
|
Andreas Koepf
|
476e37e70b
|
use Decimal class for numeric comparison e.g. +0123.100 == 123.1
|
2025-02-21 15:36:06 +01:00 |
|
AhmedSaif2
|
6b5c7a8637
|
add a helper function to handle redundant code
|
2025-02-21 15:54:00 +02:00 |
|
Zafir Stojanovski
|
b5f5733052
|
update system prompt
|
2025-02-15 17:41:05 +01:00 |
|
Andreas Koepf
|
4ddd04a825
|
incorporate prompt changes suggested by Miserlou
|
2025-02-14 15:44:00 +01:00 |
|
Andreas Koepf
|
2726caf2fe
|
ignore single whitespace at beginning and end of answer, use reward = len(oracle_answer) / len(answer)
|
2025-02-14 15:40:12 +01:00 |
|
Zafir Stojanovski
|
52a56cbc4f
|
system prompt for structured output, and parse such outputs
|
2025-02-12 10:44:42 +01:00 |
|
Andreas Koepf
|
3ca9a709e8
|
gsm_symbolic generator changes
|
2025-02-05 20:58:01 +01:00 |
|
Andreas Koepf
|
1bc56b8559
|
extract answer from last answer tag
|
2025-01-28 16:37:19 +00:00 |
|
Andreas Koepf
|
655de7a7f3
|
add first example with OpenRLHF
|
2025-01-28 14:40:06 +00:00 |
|