2020年12月20日星期日

Fastest python algorithm to divide a list of numbers into N groups that maximizes the total mean differences of every two closest groups

Say I have a list of integers

[1, 107, 1010, 3, 121, 5, 1024, 118, 101, 1, 123, 1357]  

I want to divide them into N=3 groups (not necessarily equal in length), meanwhile maximize the total differences between the mean values of every two closest groups. This is an extreme example, the expected partition is obviously

res = [[1, 3, 5, 1],         [107, 121, 118, 101, 123],         [1010, 1024, 1357]]  

I use the following function to calculate the total differences between the mean values of every two closest groups

import numpy as np    def calc_mean_diff_sum(list_of_lists):      list_of_lists.sort(key=lambda x: np.mean(x))      return np.sum([np.mean(list_of_lists[i+1]) - np.mean(group)                     for i, group in enumerate(list_of_lists[:-1])])    print(calc_mean_diff_sum(res))  >>> 1127.8333333333333  

However, how should I divide a list of 12 random numbers into 3 groups?

a = np.random.rand(12,)  

Right now I am using brute force enumeration of all possible partitions and choose the one that maximizes the calc_mean_diff_sum function, but I think there must be some faster algorithms out there. Any suggestions?

I want to say this question has further implication of a clustering problem based on similarities.

https://stackoverflow.com/questions/65386233/fastest-python-algorithm-to-divide-a-list-of-numbers-into-n-groups-that-maximize December 21, 2020 at 08:38AM

没有评论:

发表评论