2021年3月26日星期五

How to make predictions on a logistic regression model with a separate df for train and test data

I am working on a logistic regression model. I started out with two separate CSV files, one for training data and one for testing data. I created two separate data frames, one for each data set. I am able to fit and train the model just fine but am getting an error when I try to make predictions using the test data.

I am not sure if I am setting my y_train variable properly or if there is another issue going on. I get the following error messages when I run the prediction.

Here is the setup and code for the model"

#Setting x and y values  X_train = clean_df_train[['account_length','total_day_charge','total_eve_charge', 'total_night_charge',               'number_customer_service_calls']]  y_train = clean_df_train['churn']    X_test = clean_df_test[['account_length','total_day_charge','total_eve_charge', 'total_night_charge',               'number_customer_service_calls']]  y_test = clean_df_test['churn']  
#Fitting / Training the Logistic Regression Model  logreg = LogisticRegression()  logreg.fit(X_train, y_train)  
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,            intercept_scaling=1, max_iter=100, multi_class='warn',            n_jobs=None, penalty='l2', random_state=None, solver='warn',            tol=0.0001, verbose=0, warm_start=False)  
#Make Predictions with Logit Model  predictions = logreg.predict(X_test)    #Measure Performance of the model  from sklearn.metrics import classification_report    #Measure performance of the model  classification_report(y_test, predictions)  
  1522     """     1523   -> 1524     y_type, y_true, y_pred = _check_targets(y_true, y_pred)     1525      1526     labels_given = True    E:\Users\davidwool\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)       79     if len(y_type) > 1:       80         raise ValueError("Classification metrics can't handle a mix of {0} "  ---> 81                          "and {1} targets".format(type_true, type_pred))       82        83     # We can't have more than one value on y_type => The set is no more needed    ValueError: Classification metrics can't handle a mix of continuous and binary targets  

Here is the head of the data that I am working with. The churn column is completely blank as it is what I am trying to predict.

clean_df_test.head()        account_length  total_day_charge    total_eve_charge    total_night_charge  number_customer_service_calls   churn  0               74             31.91               13.89                 8.82                               0     NaN  1               57             30.06               16.58                 9.61                               0     NaN  2              111             36.43               17.72                 8.21                               1     NaN  3               77             42.81               17.48                12.38                               2     NaN  4               36             47.84               17.19                 8.42                               2     NaN  

Here are the dtypes as well.

clean_df_test.dtypes  account_length                     int64  total_day_charge                 float64  total_eve_charge                 float64  total_night_charge               float64  number_customer_service_calls      int64  churn                            float64  dtype: object  

The main problem is that I am used to using sklearn's train_test_split() function on one dataset where as here I have 2 separate datasets so I am not sure what to set my y-test to be.

https://stackoverflow.com/questions/66805004/how-to-make-predictions-on-a-logistic-regression-model-with-a-separate-df-for-tr March 26, 2021 at 01:50AM

没有评论:

发表评论