2021年3月19日星期五

More efficient double For loop over 70k variables?

I have a long list of words in an array -- about 70k of them. I'm building a graph of all the words that are within some edit distance of the other words.

graph = dict()  for word in words:      for target in words:          if distance(word, target) == 1:              if graph.get(word, 0) == 0:                  graph[word] = [target]              else:                  graph[word].append(target)  

In other words:

graph['sword'] = ['swore','sworn','word','swords',...]  

etc.

We can assume that the function distance is optimized.

This loop takes a long time to run. Hours if I'm lucky, days if I'm not. When I'm done, I obviously save and pickle the file for later use, but if I want to update the graph with another word list, I'm stuck running this loop again.

Is there a package suitable for handling a case like this? Some way of speeding up the loop, or having some package magically turn this into C code so that it runs lickety split? I'm new to this world, so please inform me if there is a way, or if I'll have to take up knitting.

https://stackoverflow.com/questions/66717846/more-efficient-double-for-loop-over-70k-variables March 20, 2021 at 11:04AM

没有评论:

发表评论