Commit graph

31 commits

Author SHA1 Message Date
Shannon Sands
d6f9d58606 new env runs locally 2025-05-14 11:57:45 -07:00
Shannon Sands
54ae40840d no-thinking env added 2025-05-14 11:28:39 -07:00
Shannon Sands
21cc528b85 move best-of-n selection to util 2025-05-14 10:35:12 -07:00
Shannon Sands
4c00e2b209 move message history out to utils 2025-05-14 10:13:56 -07:00
Shannon Sands
8cd9e4d776 made private collect_trajectory re changes 2025-05-13 07:58:48 +10:00
Shannon Sands
36f6822d71 Merge branch 'main' into blackjack2-env 2025-05-13 07:54:04 +10:00
Shannon Sands
e480c30b8b removed new fn 2025-05-13 07:49:28 +10:00
Shannon Sands
220b92be47 Linting and cleanup 2025-05-10 21:15:00 +10:00
Shannon Sands
6617d402b3 Doing exact V* calc 2025-05-10 20:24:31 +10:00
Shannon Sands
a049dde6b1 Adding thinking reward 2025-05-10 19:50:30 +10:00
Shannon Sands
840ff20921 Fixed typo, revising reward function 2025-05-10 19:45:06 +10:00
dmahan93
92428fec8f add gym taxi env 2025-05-09 19:05:01 -05:00
Shannon Sands
7fe1a40368 readd multistep masking 2025-05-10 09:24:55 +10:00
Shannon Sands
9efd8c1529 linting 2025-05-10 08:44:35 +10:00
Shannon Sands
06c4a9e65c linting 2025-05-10 08:43:03 +10:00
Shannon Sands
0248cc1227 Removed old code, added comments 2025-05-10 08:39:52 +10:00
Shannon Sands
ba604d44f9 update local server 2025-05-10 08:18:41 +10:00
Shannon Sands
c506bb147e simplified config and reward 2025-05-10 08:04:39 +10:00
Shannon Sands
7e95c0b67d moving test sever 2025-05-10 07:47:44 +10:00
Shannon Sands
a7dfd377da moving env to clean branch 2025-05-10 07:44:29 +10:00
dmahan93
40b12dae60 run pre-commit on all files 2025-05-09 09:54:20 -05:00
dmahan93
b959c30ebf
Merge pull request #31 from NousResearch/fix-math-evals-due-to-updated-dataset
fix olympiadbench due to upstream changes
2025-05-09 09:42:06 -05:00
dmahan93
e09ae8d3d3 fix olympiadbench due to upstream changes 2025-05-09 09:41:10 -05:00
hjc-puro
629d8c1731
Merge pull request #14 from NousResearch/2025-05-02-server-cli 2025-05-09 13:37:54 +08:00
Artem Yatsenko
0f15be68a2 fix multimodal envs. add view_run_multimodal 2025-05-07 21:53:01 +00:00
edmund
2cb1ff0087 Removed mentions of NousResearch/DeepHermes-3-Llama-3-1B-Preview and swapped it for NousResearch/DeepHermes-3-Llama-3-3B-Preview
I don't think there is a NousResearch/DeepHermes-3-Llama-3-1B-Preview
2025-05-07 18:03:17 +01:00
teknium1
d2dbab7d22 Add additional completions table info: metric, magnitude, and direction for ground truth 2025-05-04 03:30:50 -07:00
teknium1
c3b80832e9 lowering the defaults for fundamental finance env 2025-05-04 03:05:25 -07:00
hjc-puro
4348dd2ec1 hide complicated openai config override behavior somewhere else 2025-05-03 14:18:50 -07:00
teknium1
a2e36227aa add metric logging 2025-05-02 02:34:17 -07:00
Dakota Nous
621d00dd80 first commit 2025-04-29 12:10:10 -07:00