Commit graph

65 commits

Author SHA1 Message Date
Cavit Erginsoy
6c564b3dd9 lint 2025-02-03 11:35:30 +00:00
Cavit Erginsoy
1e27021e11 Merge remote-tracking branch 'upstream/main' 2025-02-03 07:44:32 +00:00
Cavit Erginsoy
86c246ff5e Refactor word ladder generation with improved validation and graph-based path finding
- Enhanced configuration validation with size and length constraints
- Implemented graph-based neighbor computation and caching
- Simplified path finding algorithm with more robust length checking
- Added more flexible word set loading with configurable length ranges
- Improved error handling for dataset generation
2025-02-03 07:21:43 +00:00
Cavit Erginsoy
511425797f Improve efficiency and reduce plural bias in word ladder generation
- Precomputed sorted word lists for each word length (stored in self.words_lists) to avoid redundant sorting on every _generate_word_pair call.
- Updated _generate_word_pair to utilize the cached sorted list, significantly improving computational efficiency.
- Implemented weighted random sampling for 5-letter words, giving words ending with 'S' a lower weight (0.5) to reduce bias without completely filtering them out.
2025-02-01 14:37:21 +00:00
Cavit Erginsoy
fce0c4fa3f refactor: Clarify word ladder question 2025-02-01 14:27:06 +00:00
Andreas Koepf
d4706c7128 lint 2025-01-31 12:16:08 +01:00
Andreas Koepf (aider)
aa39c6441a fix: Improve base conversion logic for non-standard bases 2025-01-31 12:09:32 +01:00
Andreas Koepf
9b33522b41 use sorted() for repeatable generation outputs (e.g. GALLERY.md) 2025-01-30 23:33:43 +01:00
Andreas Koepf
bf62f631dd lint 2025-01-30 23:14:32 +01:00
Cavit Erginsoy
d57a7947a4 INIT 2025-01-30 21:32:46 +00:00
Cavit Erginsoy
4f14a20725 INIT 2025-01-30 19:42:58 +00:00
Andreas Koepf
7b3cc45bbd add newline to word sorting template 2025-01-27 16:57:49 +01:00
Andreas Koepf (aider)
b7029fc5df feat: Clarify word sorting instructions with ASCII/Unicode ordering and output format 2025-01-26 22:29:57 +01:00
Andreas Koepf (aider)
557fac66bf refactor: Change word sorting answer format from list string to comma-separated string 2025-01-26 22:23:18 +01:00
Andreas Koepf
c3b6af35f0 min python 3.11 to support StrEnum 2025-01-26 22:17:43 +01:00
Andreas Koepf
ad9f0d265c fix unit tests, lower python dependency to 3.9 2025-01-26 16:55:17 +01:00
Andreas Koepf (aider)
56f422d74e fix: Import missing 're' module for regex word extraction 2025-01-26 16:14:23 +01:00
Andreas Koepf (aider)
7df974a753 feat: Add word sorting task generation with text transformations 2025-01-26 16:14:10 +01:00
Andreas Koepf (aider)
187df2bf7b feat: Add word sorting dataset with configurable text transformations 2025-01-26 16:11:32 +01:00
Andreas Koepf (aider)
8e92025cf7 refactor: Update default sentence length constraints to 3-20 words 2025-01-26 15:57:02 +01:00
Andreas Koepf (aider)
65c60b2afa refactor: Update sentence extraction regex to preserve ending punctuation 2025-01-26 15:56:03 +01:00
Andreas Koepf (aider)
028d5ccf96 refactor: Rename num_of_words_in_sentence and add max_words_in_sentence config 2025-01-26 15:46:21 +01:00
Andreas Koepf
d3fe900889 refactor: Update sentence reordering prompt to be more descriptive 2025-01-26 15:46:19 +01:00
Andreas Köpf
add29c2fcb Merge branch 'main' into koko/scramble 2025-01-26 15:41:25 +01:00
Andreas Koepf
cdf08d9d5b rename word_reversal.py -> word_sequence_reversal.py 2025-01-26 11:57:50 +01:00
Andreas Koepf (aider)
e9ac50a6fc refactor: Update import for word sequence reversal module 2025-01-26 11:53:48 +01:00
Andreas Koepf (aider)
be00e0bab2 fix: Correct WordReversalConfig references to WordSequenceReversalConfig 2025-01-26 11:52:25 +01:00
Andreas Koepf (aider)
c641b25508 refactor: Rename WordReversalDataset to WordSequenceReversalDataset 2025-01-26 11:52:15 +01:00
Andreas Koepf (aider)
4d582387de feat: Add SpellBackward imports and exports to algorithmic package 2025-01-26 11:48:18 +01:00
Andreas Koepf (aider)
9b93ac2a1f feat: Add spell_backward.py module for word reversal task generation 2025-01-26 11:46:07 +01:00
Andreas Koepf (aider)
696d60dd2b refactor: Move SpellBackwardDataset to separate file 2025-01-26 11:44:27 +01:00
Andreas Koepf (aider)
b18bede2bf feat: Add SpellBackwardDataset with word reversal and length filtering 2025-01-26 11:40:47 +01:00
abdulhakeem
3fabb319ab Make more tiny correction 2025-01-25 23:25:55 -06:00
abdulhakeem
b13d0762d6 Correct logic for number of words in sentence 2025-01-25 23:22:16 -06:00
abdulhakeem
4d50cfd514 Add parameters to _generate_sentence_dataset 2025-01-25 23:17:39 -06:00
abdulhakeem
384a00ec71 Ensure only words are considered 2025-01-25 23:08:41 -06:00
abdulhakeem
c7c12269ad Add assertion to ensure number of words in sentence is positive 2025-01-25 23:02:17 -06:00
abdulhakeem
a72629c28f Add sentence reordering and unit tests to validate it 2025-01-25 22:52:35 -06:00
Andreas Koepf
6af2231d84 formatting 2025-01-25 18:51:28 +01:00
Andreas Koepf
66666cf2bf remove old files 2025-01-25 18:51:07 +01:00
Andreas Koepf (aider)
f185fba00e refactor: Rename UnscrambleWordsDataset to LetterJumbleDataset 2025-01-25 18:37:42 +01:00
Andreas Koepf (aider)
7ec77bf619 feat: Add consecutive words option and ensure minimum word swap in UnscrambleWords 2025-01-25 18:29:02 +01:00
Andreas Koepf (aider)
d57a88a5b3 feat: Add unscramble_words dataset with configurable word scrambling 2025-01-25 18:21:31 +01:00
Andreas Koepf (aider)
2108a2c3e9 feat: Add Caesar cipher import to algorithmic module 2025-01-25 18:05:06 +01:00
Andreas Koepf (aider)
a11c6bfb26 feat: Add Caesar cipher dataset generator with encryption and decryption tasks 2025-01-25 18:00:13 +01:00
Andreas Koepf
519e411fa5 add reasoning_gym.create_dataset({name}, ...) global factory function 2025-01-25 00:58:34 +01:00
Andreas Koepf
0d2d8ba6a0 pass config to ProceduralDataset base 2025-01-25 00:23:05 +01:00
Andreas Koepf
5c5d46b4bd formatting, cleanup 2025-01-24 17:12:42 +01:00
Andreas Koepf (aider)
d413717eff refactor: Inherit NumberSortingDataset from ProceduralDataset 2025-01-24 11:18:10 +01:00
Andreas Koepf (aider)
190c2aafa4 refactor: Inherit WordReversalDataset from ProceduralDataset 2025-01-24 11:16:15 +01:00