Shannon Sands
|
74a8a1f7bb
|
updated README
|
2025-05-15 11:58:47 -07:00 |
|
Shannon Sands
|
d768ad68aa
|
merged latest
|
2025-05-15 11:54:33 -07:00 |
|
Shannon Sands
|
ba6ba173c1
|
Merge branch 'main' into infinimath-env
|
2025-05-15 11:44:48 -07:00 |
|
hjc-puro
|
dcda88d79b
|
fix validation errors
|
2025-05-15 04:30:59 -07:00 |
|
teknium1
|
1a9fa016b5
|
add dependencies to the env readme
|
2025-05-14 19:44:13 -07:00 |
|
teknium1
|
90e235a3e9
|
update environments readme
|
2025-05-14 19:40:32 -07:00 |
|
teknium1
|
2ab8905d4f
|
fix score
|
2025-05-14 19:35:43 -07:00 |
|
teknium1
|
8a0e107806
|
change eval set size since this is a small dataset we need mo data for trainn
|
2025-05-14 19:18:01 -07:00 |
|
teknium1
|
bcc38567ca
|
update some dataset stuff to use allenai's
|
2025-05-14 18:39:31 -07:00 |
|
teknium1
|
881af55f9a
|
add instruction following algo env
|
2025-05-14 18:31:05 -07:00 |
|
Shannon Sands
|
c72a27d376
|
fixed linting in latest main
|
2025-05-14 17:29:57 -07:00 |
|
Shannon Sands
|
00dd120067
|
Merge branch 'main' into blackjack2-env
|
2025-05-14 17:27:44 -07:00 |
|
Shannon Sands
|
8fad665f6a
|
moved folder location
|
2025-05-14 17:22:30 -07:00 |
|
Shannon Sands
|
c2bf3f5acd
|
moved folder location
|
2025-05-14 17:22:18 -07:00 |
|
Joe Li
|
c1ae25c202
|
Merge pull request #26 from NousResearch/coding_server
add code execution environment
|
2025-05-14 15:08:10 -07:00 |
|
Shannon Sands
|
3fba8e3527
|
linting
|
2025-05-14 14:22:25 -07:00 |
|
Shannon Sands
|
d8ab1a6758
|
linting
|
2025-05-14 14:20:54 -07:00 |
|
Shannon Sands
|
1a7c0294fa
|
refactoring for more clarity
|
2025-05-14 14:18:43 -07:00 |
|
Shannon Sands
|
bb6c205efe
|
Linting
|
2025-05-14 14:05:52 -07:00 |
|
Shannon Sands
|
67cfd961c5
|
linting
|
2025-05-14 14:01:31 -07:00 |
|
Shannon Sands
|
826de9e283
|
Updated README
|
2025-05-14 13:57:20 -07:00 |
|
Shannon Sands
|
f5172b45a8
|
Added README
|
2025-05-14 13:35:15 -07:00 |
|
Shannon Sands
|
85f462df5e
|
Updated test scripts
|
2025-05-14 12:05:59 -07:00 |
|
Shannon Sands
|
d6f9d58606
|
new env runs locally
|
2025-05-14 11:57:45 -07:00 |
|
Shannon Sands
|
54ae40840d
|
no-thinking env added
|
2025-05-14 11:28:39 -07:00 |
|
Shannon Sands
|
21cc528b85
|
move best-of-n selection to util
|
2025-05-14 10:35:12 -07:00 |
|
Shannon Sands
|
4c00e2b209
|
move message history out to utils
|
2025-05-14 10:13:56 -07:00 |
|
dmahan93
|
6e9405ba95
|
Fix bad merge
|
2025-05-12 20:02:54 -05:00 |
|
dmahan93
|
0aaf59fc9a
|
add trl server
add gsm8k example for axolotl checking
|
2025-05-12 19:04:46 -05:00 |
|
dmahan93
|
96be544228
|
Merge commit '71e7a5ca27' into add-support-for-custom-api-servers
|
2025-05-12 18:40:35 -05:00 |
|
Shannon Sands
|
8cd9e4d776
|
made private collect_trajectory re changes
|
2025-05-13 07:58:48 +10:00 |
|
Shannon Sands
|
36f6822d71
|
Merge branch 'main' into blackjack2-env
|
2025-05-13 07:54:04 +10:00 |
|
Shannon Sands
|
e480c30b8b
|
removed new fn
|
2025-05-13 07:49:28 +10:00 |
|
Shannon Sands
|
3e2012a7dc
|
linting
|
2025-05-12 09:20:51 +10:00 |
|
Shannon Sands
|
40d17be056
|
wandb name
|
2025-05-12 09:12:30 +10:00 |
|
Shannon Sands
|
c8ed107bca
|
linting done
|
2025-05-12 09:06:31 +10:00 |
|
Shannon Sands
|
fd5b87011d
|
word problem generation working
|
2025-05-12 08:57:21 +10:00 |
|
Shannon Sands
|
5c0c7f5b10
|
updates run
|
2025-05-12 08:13:35 +10:00 |
|
Shannon Sands
|
bfc967c4bd
|
linting and local testing tidy up
|
2025-05-12 08:07:39 +10:00 |
|
Shannon Sands
|
141ab66792
|
Updated comments
|
2025-05-12 08:01:46 +10:00 |
|
Shannon Sands
|
e96970f82e
|
linting
|
2025-05-12 07:53:12 +10:00 |
|
Shannon Sands
|
bdcc3cb88f
|
added cli & config init
|
2025-05-12 07:49:51 +10:00 |
|
Shannon Sands
|
04b32fd8f3
|
tidying up comments and methods
|
2025-05-12 07:41:59 +10:00 |
|
Shannon Sands
|
137f8381ec
|
removed reward function registry
|
2025-05-12 07:37:38 +10:00 |
|
Shannon Sands
|
4e7fcd3c9a
|
copied from trajectory handler branch
|
2025-05-12 07:26:10 +10:00 |
|
Shannon Sands
|
220b92be47
|
Linting and cleanup
|
2025-05-10 21:15:00 +10:00 |
|
Shannon Sands
|
6617d402b3
|
Doing exact V* calc
|
2025-05-10 20:24:31 +10:00 |
|
Shannon Sands
|
a049dde6b1
|
Adding thinking reward
|
2025-05-10 19:50:30 +10:00 |
|
Shannon Sands
|
840ff20921
|
Fixed typo, revising reward function
|
2025-05-10 19:45:06 +10:00 |
|
dmahan93
|
92428fec8f
|
add gym taxi env
|
2025-05-09 19:05:01 -05:00 |
|