So, I found a simple Linear Regression data in Kaggle. The problem is without feature scaling, the accuracy is 98% but when Feature Scaling is applied on the training set, the accuracy falls down to 72%.
Can someone explain why this is happening. Here is the code below and the training and testing graphs respectively.
train_dataset = pd.read_csv('train.csv') test_dataset = pd.read_csv('test.csv') train_dataset.dropna(inplace = True) X_train = train_dataset.iloc[:, :-1].values y_train = train_dataset.iloc[:, -1].values X_test = test_dataset.iloc[:, :-1].values y_test = test_dataset.iloc[:, -1].values sc = StandardScaler() X_train[:, 0:] = sc.fit_transform(X_train[:, 0:]) X_test[:, 0:] = sc.transform(X_test[:, 0:]) regression = LinearRegression() regression.fit(X_train, y_train) y_pred = regression.predict(X_test) plt.figure(figsize=(32, 16)) plt.scatter(X_train, y_train, color = 'red') plt.plot(X_train, regression.predict(X_train), color = 'blue') plt.title('Salary vs Experience (Training Set)') plt.xlabel('Experience') plt.ylabel('Salary', rotation = 0) plt.show() plt.figure(figsize=(32, 16)) plt.scatter(X_test, y_test, color = 'red') plt.plot(X_test, regression.predict(X_test), color = 'blue') plt.title('Salary vs Experience (Test Set)') plt.xlabel('Experience') plt.ylabel('Salary', rotation = 0) plt.show() r2_score(y_test, regression.predict(X_test))
Edit : Let me rephrase the question to....Why did the distribution of my X_test change to just -1, 0, 1 after applying Feature Scaling on the training set.
https://stackoverflow.com/questions/66873458/why-did-the-test-data-distribution-change-on-applying-feature-scaling-to-the-tra March 30, 2021 at 10:59PM
没有评论:
发表评论