Shannon Sands
|
c72a27d376
|
fixed linting in latest main
|
2025-05-14 17:29:57 -07:00 |
|
Shannon Sands
|
00dd120067
|
Merge branch 'main' into blackjack2-env
|
2025-05-14 17:27:44 -07:00 |
|
Shannon Sands
|
8fad665f6a
|
moved folder location
|
2025-05-14 17:22:30 -07:00 |
|
Shannon Sands
|
c2bf3f5acd
|
moved folder location
|
2025-05-14 17:22:18 -07:00 |
|
Joe Li
|
c1ae25c202
|
Merge pull request #26 from NousResearch/coding_server
add code execution environment
|
2025-05-14 15:08:10 -07:00 |
|
Shannon Sands
|
3fba8e3527
|
linting
|
2025-05-14 14:22:25 -07:00 |
|
Shannon Sands
|
d8ab1a6758
|
linting
|
2025-05-14 14:20:54 -07:00 |
|
Shannon Sands
|
1a7c0294fa
|
refactoring for more clarity
|
2025-05-14 14:18:43 -07:00 |
|
Shannon Sands
|
bb6c205efe
|
Linting
|
2025-05-14 14:05:52 -07:00 |
|
Shannon Sands
|
67cfd961c5
|
linting
|
2025-05-14 14:01:31 -07:00 |
|
Shannon Sands
|
826de9e283
|
Updated README
|
2025-05-14 13:57:20 -07:00 |
|
Shannon Sands
|
f5172b45a8
|
Added README
|
2025-05-14 13:35:15 -07:00 |
|
Shannon Sands
|
85f462df5e
|
Updated test scripts
|
2025-05-14 12:05:59 -07:00 |
|
Shannon Sands
|
d6f9d58606
|
new env runs locally
|
2025-05-14 11:57:45 -07:00 |
|
Shannon Sands
|
54ae40840d
|
no-thinking env added
|
2025-05-14 11:28:39 -07:00 |
|
Shannon Sands
|
21cc528b85
|
move best-of-n selection to util
|
2025-05-14 10:35:12 -07:00 |
|
Shannon Sands
|
4c00e2b209
|
move message history out to utils
|
2025-05-14 10:13:56 -07:00 |
|
dmahan93
|
6e9405ba95
|
Fix bad merge
|
2025-05-12 20:02:54 -05:00 |
|
dmahan93
|
0aaf59fc9a
|
add trl server
add gsm8k example for axolotl checking
|
2025-05-12 19:04:46 -05:00 |
|
dmahan93
|
96be544228
|
Merge commit '71e7a5ca27' into add-support-for-custom-api-servers
|
2025-05-12 18:40:35 -05:00 |
|
Shannon Sands
|
8cd9e4d776
|
made private collect_trajectory re changes
|
2025-05-13 07:58:48 +10:00 |
|
Shannon Sands
|
36f6822d71
|
Merge branch 'main' into blackjack2-env
|
2025-05-13 07:54:04 +10:00 |
|
Shannon Sands
|
e480c30b8b
|
removed new fn
|
2025-05-13 07:49:28 +10:00 |
|
Shannon Sands
|
3e2012a7dc
|
linting
|
2025-05-12 09:20:51 +10:00 |
|
Shannon Sands
|
40d17be056
|
wandb name
|
2025-05-12 09:12:30 +10:00 |
|
Shannon Sands
|
c8ed107bca
|
linting done
|
2025-05-12 09:06:31 +10:00 |
|
Shannon Sands
|
fd5b87011d
|
word problem generation working
|
2025-05-12 08:57:21 +10:00 |
|
Shannon Sands
|
5c0c7f5b10
|
updates run
|
2025-05-12 08:13:35 +10:00 |
|
Shannon Sands
|
bfc967c4bd
|
linting and local testing tidy up
|
2025-05-12 08:07:39 +10:00 |
|
Shannon Sands
|
141ab66792
|
Updated comments
|
2025-05-12 08:01:46 +10:00 |
|
Shannon Sands
|
e96970f82e
|
linting
|
2025-05-12 07:53:12 +10:00 |
|
Shannon Sands
|
bdcc3cb88f
|
added cli & config init
|
2025-05-12 07:49:51 +10:00 |
|
Shannon Sands
|
04b32fd8f3
|
tidying up comments and methods
|
2025-05-12 07:41:59 +10:00 |
|
Shannon Sands
|
137f8381ec
|
removed reward function registry
|
2025-05-12 07:37:38 +10:00 |
|
Shannon Sands
|
4e7fcd3c9a
|
copied from trajectory handler branch
|
2025-05-12 07:26:10 +10:00 |
|
Shannon Sands
|
220b92be47
|
Linting and cleanup
|
2025-05-10 21:15:00 +10:00 |
|
Shannon Sands
|
6617d402b3
|
Doing exact V* calc
|
2025-05-10 20:24:31 +10:00 |
|
Shannon Sands
|
a049dde6b1
|
Adding thinking reward
|
2025-05-10 19:50:30 +10:00 |
|
Shannon Sands
|
840ff20921
|
Fixed typo, revising reward function
|
2025-05-10 19:45:06 +10:00 |
|
dmahan93
|
92428fec8f
|
add gym taxi env
|
2025-05-09 19:05:01 -05:00 |
|
Shannon Sands
|
7fe1a40368
|
readd multistep masking
|
2025-05-10 09:24:55 +10:00 |
|
Shannon Sands
|
9efd8c1529
|
linting
|
2025-05-10 08:44:35 +10:00 |
|
Shannon Sands
|
06c4a9e65c
|
linting
|
2025-05-10 08:43:03 +10:00 |
|
Shannon Sands
|
0248cc1227
|
Removed old code, added comments
|
2025-05-10 08:39:52 +10:00 |
|
Shannon Sands
|
ba604d44f9
|
update local server
|
2025-05-10 08:18:41 +10:00 |
|
Shannon Sands
|
c506bb147e
|
simplified config and reward
|
2025-05-10 08:04:39 +10:00 |
|
Shannon Sands
|
7e95c0b67d
|
moving test sever
|
2025-05-10 07:47:44 +10:00 |
|
Shannon Sands
|
a7dfd377da
|
moving env to clean branch
|
2025-05-10 07:44:29 +10:00 |
|
dmahan93
|
40b12dae60
|
run pre-commit on all files
|
2025-05-09 09:54:20 -05:00 |
|
dmahan93
|
b959c30ebf
|
Merge pull request #31 from NousResearch/fix-math-evals-due-to-updated-dataset
fix olympiadbench due to upstream changes
|
2025-05-09 09:42:06 -05:00 |
|