Commit graph

763 commits

Author SHA1 Message Date
Shannon Sands
c72a27d376 fixed linting in latest main 2025-05-14 17:29:57 -07:00
Shannon Sands
00dd120067 Merge branch 'main' into blackjack2-env 2025-05-14 17:27:44 -07:00
Shannon Sands
8fad665f6a moved folder location 2025-05-14 17:22:30 -07:00
Shannon Sands
c2bf3f5acd moved folder location 2025-05-14 17:22:18 -07:00
Joe Li
c1ae25c202
Merge pull request #26 from NousResearch/coding_server
add code execution environment
2025-05-14 15:08:10 -07:00
Shannon Sands
3fba8e3527 linting 2025-05-14 14:22:25 -07:00
Shannon Sands
d8ab1a6758 linting 2025-05-14 14:20:54 -07:00
Shannon Sands
1a7c0294fa refactoring for more clarity 2025-05-14 14:18:43 -07:00
Shannon Sands
bb6c205efe Linting 2025-05-14 14:05:52 -07:00
Shannon Sands
67cfd961c5 linting 2025-05-14 14:01:31 -07:00
Shannon Sands
826de9e283 Updated README 2025-05-14 13:57:20 -07:00
Shannon Sands
f5172b45a8 Added README 2025-05-14 13:35:15 -07:00
Shannon Sands
85f462df5e Updated test scripts 2025-05-14 12:05:59 -07:00
Shannon Sands
d6f9d58606 new env runs locally 2025-05-14 11:57:45 -07:00
Shannon Sands
54ae40840d no-thinking env added 2025-05-14 11:28:39 -07:00
Shannon Sands
21cc528b85 move best-of-n selection to util 2025-05-14 10:35:12 -07:00
Shannon Sands
4c00e2b209 move message history out to utils 2025-05-14 10:13:56 -07:00
dmahan93
6e9405ba95 Fix bad merge 2025-05-12 20:02:54 -05:00
dmahan93
0aaf59fc9a add trl server
add gsm8k example for axolotl checking
2025-05-12 19:04:46 -05:00
dmahan93
96be544228 Merge commit '71e7a5ca27' into add-support-for-custom-api-servers 2025-05-12 18:40:35 -05:00
Shannon Sands
8cd9e4d776 made private collect_trajectory re changes 2025-05-13 07:58:48 +10:00
Shannon Sands
36f6822d71 Merge branch 'main' into blackjack2-env 2025-05-13 07:54:04 +10:00
Shannon Sands
e480c30b8b removed new fn 2025-05-13 07:49:28 +10:00
Shannon Sands
3e2012a7dc linting 2025-05-12 09:20:51 +10:00
Shannon Sands
40d17be056 wandb name 2025-05-12 09:12:30 +10:00
Shannon Sands
c8ed107bca linting done 2025-05-12 09:06:31 +10:00
Shannon Sands
fd5b87011d word problem generation working 2025-05-12 08:57:21 +10:00
Shannon Sands
5c0c7f5b10 updates run 2025-05-12 08:13:35 +10:00
Shannon Sands
bfc967c4bd linting and local testing tidy up 2025-05-12 08:07:39 +10:00
Shannon Sands
141ab66792 Updated comments 2025-05-12 08:01:46 +10:00
Shannon Sands
e96970f82e linting 2025-05-12 07:53:12 +10:00
Shannon Sands
bdcc3cb88f added cli & config init 2025-05-12 07:49:51 +10:00
Shannon Sands
04b32fd8f3 tidying up comments and methods 2025-05-12 07:41:59 +10:00
Shannon Sands
137f8381ec removed reward function registry 2025-05-12 07:37:38 +10:00
Shannon Sands
4e7fcd3c9a copied from trajectory handler branch 2025-05-12 07:26:10 +10:00
Shannon Sands
220b92be47 Linting and cleanup 2025-05-10 21:15:00 +10:00
Shannon Sands
6617d402b3 Doing exact V* calc 2025-05-10 20:24:31 +10:00
Shannon Sands
a049dde6b1 Adding thinking reward 2025-05-10 19:50:30 +10:00
Shannon Sands
840ff20921 Fixed typo, revising reward function 2025-05-10 19:45:06 +10:00
dmahan93
92428fec8f add gym taxi env 2025-05-09 19:05:01 -05:00
Shannon Sands
7fe1a40368 readd multistep masking 2025-05-10 09:24:55 +10:00
Shannon Sands
9efd8c1529 linting 2025-05-10 08:44:35 +10:00
Shannon Sands
06c4a9e65c linting 2025-05-10 08:43:03 +10:00
Shannon Sands
0248cc1227 Removed old code, added comments 2025-05-10 08:39:52 +10:00
Shannon Sands
ba604d44f9 update local server 2025-05-10 08:18:41 +10:00
Shannon Sands
c506bb147e simplified config and reward 2025-05-10 08:04:39 +10:00
Shannon Sands
7e95c0b67d moving test sever 2025-05-10 07:47:44 +10:00
Shannon Sands
a7dfd377da moving env to clean branch 2025-05-10 07:44:29 +10:00
dmahan93
40b12dae60 run pre-commit on all files 2025-05-09 09:54:20 -05:00
dmahan93
b959c30ebf
Merge pull request #31 from NousResearch/fix-math-evals-due-to-updated-dataset
fix olympiadbench due to upstream changes
2025-05-09 09:42:06 -05:00