I'm writing a recursive backtracking search to find anagrams for a phrase. For the first step, I'm trying to filter out all the wrong words from a dictionary before I feed it to the recursive algorithm.
The dictionary file looks like this:
aback abacus abalone abandon abase ... [40,000 more words] The regex I want to construct must filter out words that contain characters that the phrase do not contain, and also words that contain more occurrences of a character than exists in the phrase.
For example, given the phrase "clint eastwood", the word "noodle" matches, but the word "stonewall" does not, since "stonewall" contains more "l" characters than "clint eastwood" does.
Simply using "[clint eastwood]+" as the regex almost does what I want, but it includes words with any number of the characters in the phrase.
没有评论:
发表评论