I have a long list of words in an array -- about 70k of them. I'm building a graph of all the words that are within some edit distance of the other words.
graph = dict() for word in words: for target in words: if distance(word, target) == 1: if graph.get(word, 0) == 0: graph[word] = [target] else: graph[word].append(target) In other words:
graph['sword'] = ['swore','sworn','word','swords',...] etc.
We can assume that the function distance is optimized.
This loop takes a long time to run. Hours if I'm lucky, days if I'm not. When I'm done, I obviously save and pickle the file for later use, but if I want to update the graph with another word list, I'm stuck running this loop again.
Is there a package suitable for handling a case like this? Some way of speeding up the loop, or having some package magically turn this into C code so that it runs lickety split? I'm new to this world, so please inform me if there is a way, or if I'll have to take up knitting.
https://stackoverflow.com/questions/66717846/more-efficient-double-for-loop-over-70k-variables March 20, 2021 at 11:04AM
没有评论:
发表评论