Shannon Sands
|
d6f9d58606
|
new env runs locally
|
2025-05-14 11:57:45 -07:00 |
|
Shannon Sands
|
54ae40840d
|
no-thinking env added
|
2025-05-14 11:28:39 -07:00 |
|
Shannon Sands
|
21cc528b85
|
move best-of-n selection to util
|
2025-05-14 10:35:12 -07:00 |
|
Shannon Sands
|
4c00e2b209
|
move message history out to utils
|
2025-05-14 10:13:56 -07:00 |
|
Shannon Sands
|
8cd9e4d776
|
made private collect_trajectory re changes
|
2025-05-13 07:58:48 +10:00 |
|
Shannon Sands
|
36f6822d71
|
Merge branch 'main' into blackjack2-env
|
2025-05-13 07:54:04 +10:00 |
|
Shannon Sands
|
e480c30b8b
|
removed new fn
|
2025-05-13 07:49:28 +10:00 |
|
Shannon Sands
|
220b92be47
|
Linting and cleanup
|
2025-05-10 21:15:00 +10:00 |
|
Shannon Sands
|
6617d402b3
|
Doing exact V* calc
|
2025-05-10 20:24:31 +10:00 |
|
Shannon Sands
|
a049dde6b1
|
Adding thinking reward
|
2025-05-10 19:50:30 +10:00 |
|
Shannon Sands
|
840ff20921
|
Fixed typo, revising reward function
|
2025-05-10 19:45:06 +10:00 |
|
dmahan93
|
92428fec8f
|
add gym taxi env
|
2025-05-09 19:05:01 -05:00 |
|
Shannon Sands
|
7fe1a40368
|
readd multistep masking
|
2025-05-10 09:24:55 +10:00 |
|
Shannon Sands
|
9efd8c1529
|
linting
|
2025-05-10 08:44:35 +10:00 |
|
Shannon Sands
|
06c4a9e65c
|
linting
|
2025-05-10 08:43:03 +10:00 |
|
Shannon Sands
|
0248cc1227
|
Removed old code, added comments
|
2025-05-10 08:39:52 +10:00 |
|
Shannon Sands
|
ba604d44f9
|
update local server
|
2025-05-10 08:18:41 +10:00 |
|
Shannon Sands
|
c506bb147e
|
simplified config and reward
|
2025-05-10 08:04:39 +10:00 |
|
Shannon Sands
|
7e95c0b67d
|
moving test sever
|
2025-05-10 07:47:44 +10:00 |
|
Shannon Sands
|
a7dfd377da
|
moving env to clean branch
|
2025-05-10 07:44:29 +10:00 |
|
dmahan93
|
40b12dae60
|
run pre-commit on all files
|
2025-05-09 09:54:20 -05:00 |
|
dmahan93
|
b959c30ebf
|
Merge pull request #31 from NousResearch/fix-math-evals-due-to-updated-dataset
fix olympiadbench due to upstream changes
|
2025-05-09 09:42:06 -05:00 |
|
dmahan93
|
e09ae8d3d3
|
fix olympiadbench due to upstream changes
|
2025-05-09 09:41:10 -05:00 |
|
hjc-puro
|
629d8c1731
|
Merge pull request #14 from NousResearch/2025-05-02-server-cli
|
2025-05-09 13:37:54 +08:00 |
|
Artem Yatsenko
|
0f15be68a2
|
fix multimodal envs. add view_run_multimodal
|
2025-05-07 21:53:01 +00:00 |
|
edmund
|
2cb1ff0087
|
Removed mentions of NousResearch/DeepHermes-3-Llama-3-1B-Preview and swapped it for NousResearch/DeepHermes-3-Llama-3-3B-Preview
I don't think there is a NousResearch/DeepHermes-3-Llama-3-1B-Preview
|
2025-05-07 18:03:17 +01:00 |
|
teknium1
|
d2dbab7d22
|
Add additional completions table info: metric, magnitude, and direction for ground truth
|
2025-05-04 03:30:50 -07:00 |
|
teknium1
|
c3b80832e9
|
lowering the defaults for fundamental finance env
|
2025-05-04 03:05:25 -07:00 |
|
hjc-puro
|
4348dd2ec1
|
hide complicated openai config override behavior somewhere else
|
2025-05-03 14:18:50 -07:00 |
|
teknium1
|
a2e36227aa
|
add metric logging
|
2025-05-02 02:34:17 -07:00 |
|
Dakota Nous
|
621d00dd80
|
first commit
|
2025-04-29 12:10:10 -07:00 |
|