Generate words of varying length give n number of characters
I need to generate all possible words of Ki lengths given n characters for example:
I cant come up with the word of len 5 but this is the idea
So basically all words of length 2 4 and 5 but without reusing characters. What would be the best way to do it?
My dictionary structure looks likes this:
And my modified code looks like this:
2 Answers 2
The code below uses a recursive generator to build solutions. To store the target letters we use a collections.Counter , this acts like a set that permits repeated items.
To simplify searching we create a dictionary for each word length that we want, storing each of those dictionaries in a dictionary called all_words , with the word length as the key. Each sub-dictionary stores lists of words containing the same letters, with the sorted letters as the key, eg ‘aet’: [‘ate’, ‘eat’, ‘tea’] .
I use the standard Unix ‘/usr/share/dict/words’ word file. If you use a file with a different format you may need to modify the code that puts words into all_words .
The solve function starts its search with the smallest word length, and works up to the largest. That’s probably the most efficient order if the set containing the longest words is the biggest, since the final search is performed by doing a simple dict lookup, which is very fast. The previous searches have to test each word in the sub-dictionary of that length, looking for keys that are still in the target bag.
FWIW, here are the results for
This code was written for Python 3. You can use it on Python 2.7 but you will need to change
First thing is standardize the words such that two words that are anagrams of each other will be handled exactly the same. We can do this by converting to lowercase and sorting the letters of the word. The next step is to distinguish between multiple occurrences of a give letter. To do this, we’ll map each letter to a symbol containing the letter, and a number representing its occurrence in the string.
Now that we have a standard representation for each word, we need a quick way to check if any permutations will match them. For this we use the trie datastructure. Here is some starter code for one:
Now you need to make an empty trie as the root, with anything as a symbol, dedicated to holding all top-level tries. Then iterate over each word we transformed earlier, and for the first symbol we produced, check if the root trie has a child with that symbol. If it does not, create a trie for it and add it. If it does, proceed to the next symbol, and check if a trie with that symbol is in the previous trie. Proceed in this fashion until you’ve exhausted all symbols, in which case the current trie node represents the standardized form for that word we transformed. Store the original word in this trie, and proceed to the next word.
Once that’s done, your entire word list will be contained in this trie datastructure. Then, you can just do something like:
To print all words that can be composed from the symbols for the target word.
Generate words of varying length give n number of characters I need to generate all possible words of Ki lengths given n characters for example: I cant come up with the word of len 5 but this