I have a simple KNN algorithm that is used to predict the "yield" from a piece of data. There are around 27k rows in a pandas dataframe with 37 different columns. I have been trying to optimize hyper-parameters (the number of nearest neighbours) but running it with one parameter has already taken so long. I was wondering what ways could I improve the code below to make it run faster?
I have tried looking at possibly getting rid of the number of for loops but have no clue where to start really:
#importing modules from math import sqrt train_data = df_KNN[:23498] test_data = df_KNN[23498:] true_test = pd.DataFrame(df_KNN) true_test = true_test.iloc[23498:, -1] true_test = true_test.to_numpy() #calculating "distance" between rows def euclidean_distance(row1, row2): distance = 0.0 for i in range(len(row1)-1): distance += ((row1[i] - row2[i])**2) return sqrt(distance) def get_neighbours(train, test_row, num_neighbours): distances = list() for train_row in train: dist = euclidean_distance(test_row, train_row) distances.append((train_row, dist)) distances.sort(key=lambda dis: dis[1]) neighbours = list() for i in range(num_neighbours): neighbours.append(distances[i][0]) return neighbours def predict_classification(train, test_row, num_neighbours): prediction_list = [] for row in test_row: neighbours = get_neighbours(train, test_row, num_neighbours) output_values = [row[-1] for row in neighbours] prediction_list.append(output_values) prediction = np.mean(prediction_list) return prediction def k_nearest_neighbours(train, test, num_neighbours): predictions = list() for row in test: output = predict_classification(train, row, num_neighbours) predictions.append(output) return (predictions) test_pred = k_nearest_neighbours(train_data, test_data, 3) from sklearn.metrics import r2_score print(r2_score(true_test, test_pred)) I know I could use other modules but for this purpose I want to implement it from scratch. Cheers!
https://stackoverflow.com/questions/65557200/how-to-speed-up-knn-algorithm January 04, 2021 at 11:06AM
没有评论:
发表评论