2021年1月3日星期日

How To Speed Up KNN Algorithm

I have a simple KNN algorithm that is used to predict the "yield" from a piece of data. There are around 27k rows in a pandas dataframe with 37 different columns. I have been trying to optimize hyper-parameters (the number of nearest neighbours) but running it with one parameter has already taken so long. I was wondering what ways could I improve the code below to make it run faster?

I have tried looking at possibly getting rid of the number of for loops but have no clue where to start really:

#importing modules  from math import sqrt    train_data = df_KNN[:23498]  test_data = df_KNN[23498:]    true_test = pd.DataFrame(df_KNN)  true_test = true_test.iloc[23498:, -1]  true_test = true_test.to_numpy()    #calculating "distance" between rows  def euclidean_distance(row1, row2):      distance = 0.0        for i in range(len(row1)-1):                    distance += ((row1[i] - row2[i])**2)        return sqrt(distance)    def get_neighbours(train, test_row, num_neighbours):            distances = list()            for train_row in train:          dist = euclidean_distance(test_row, train_row)          distances.append((train_row, dist))                distances.sort(key=lambda dis: dis[1])      neighbours = list()            for i in range(num_neighbours):          neighbours.append(distances[i][0])                return neighbours    def predict_classification(train, test_row, num_neighbours):            prediction_list = []            for row in test_row:                     neighbours = get_neighbours(train, test_row, num_neighbours)          output_values = [row[-1] for row in neighbours]          prediction_list.append(output_values)                prediction = np.mean(prediction_list)            return prediction    def k_nearest_neighbours(train, test, num_neighbours):            predictions = list()            for row in test:                    output = predict_classification(train, row, num_neighbours)          predictions.append(output)                return (predictions)    test_pred = k_nearest_neighbours(train_data, test_data, 3)    from sklearn.metrics import r2_score  print(r2_score(true_test, test_pred))  

I know I could use other modules but for this purpose I want to implement it from scratch. Cheers!

https://stackoverflow.com/questions/65557200/how-to-speed-up-knn-algorithm January 04, 2021 at 11:06AM

没有评论:

发表评论