I am working on a logistic regression model. I started out with two separate CSV files, one for training data and one for testing data. I created two separate data frames, one for each data set. I am able to fit and train the model just fine but am getting an error when I try to make predictions using the test data.
I am not sure if I am setting my y_train variable properly or if there is another issue going on. I get the following error messages when I run the prediction.
Here is the setup and code for the model"
#Setting x and y values X_train = clean_df_train[['account_length','total_day_charge','total_eve_charge', 'total_night_charge', 'number_customer_service_calls']] y_train = clean_df_train['churn'] X_test = clean_df_test[['account_length','total_day_charge','total_eve_charge', 'total_night_charge', 'number_customer_service_calls']] y_test = clean_df_test['churn']
#Fitting / Training the Logistic Regression Model logreg = LogisticRegression() logreg.fit(X_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False)
#Make Predictions with Logit Model predictions = logreg.predict(X_test) #Measure Performance of the model from sklearn.metrics import classification_report #Measure performance of the model classification_report(y_test, predictions)
1522 """ 1523 -> 1524 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 1525 1526 labels_given = True E:\Users\davidwool\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred) 79 if len(y_type) > 1: 80 raise ValueError("Classification metrics can't handle a mix of {0} " ---> 81 "and {1} targets".format(type_true, type_pred)) 82 83 # We can't have more than one value on y_type => The set is no more needed ValueError: Classification metrics can't handle a mix of continuous and binary targets
Here is the head of the data that I am working with. The churn column is completely blank as it is what I am trying to predict.
clean_df_test.head() account_length total_day_charge total_eve_charge total_night_charge number_customer_service_calls churn 0 74 31.91 13.89 8.82 0 NaN 1 57 30.06 16.58 9.61 0 NaN 2 111 36.43 17.72 8.21 1 NaN 3 77 42.81 17.48 12.38 2 NaN 4 36 47.84 17.19 8.42 2 NaN
Here are the dtypes as well.
clean_df_test.dtypes account_length int64 total_day_charge float64 total_eve_charge float64 total_night_charge float64 number_customer_service_calls int64 churn float64 dtype: object
The main problem is that I am used to using sklearn's train_test_split()
function on one dataset where as here I have 2 separate datasets so I am not sure what to set my y-test to be.
没有评论:
发表评论