2021年3月17日星期三

Regex match if all characters in a dictionary word are present in the phrase. The number of times each character occurs must also match in each other

I'm writing a recursive backtracking search to find anagrams for a phrase. For the first step, I'm trying to filter out all the wrong words from a dictionary before I feed it to the recursive algorithm.

The dictionary file looks like this:

aback  abacus  abalone  abandon  abase  ...   [40,000 more words]  

The regex I want to construct must filter out words that contain characters that the phrase do not contain, and also words that contain more occurrences of a character than exists in the phrase.

For example, given the phrase "clint eastwood", the word "noodle" matches, but the word "stonewall" does not, since "stonewall" contains more "l" characters than "clint eastwood" does.

Simply using "[clint eastwood]+" as the regex almost does what I want, but it includes words with any number of the characters in the phrase.

https://stackoverflow.com/questions/66682063/regex-match-if-all-characters-in-a-dictionary-word-are-present-in-the-phrase-th March 18, 2021 at 06:18AM

没有评论:

发表评论