I have a training set and test set for machine learning, however the training set contains too many rows of data and the test set contains too little. I calculated I need to move 245 rows from the training set to the test set to produce a better split. How can I do this? I have 5116 total rows in training set.
First I randomized the rows of the training set using this
train_df = train_df.sample(n = len(train_df)).reset_index(drop=True)
And then I wanted to grab the last 245 rows and move them to test_df
I found these two solutions here
Pandas dataframe - move rows from one dataframe to another
and
Pandas move rows from 1 DF to another DF
However they are selecting the rows based on a condition which I don't have. I kind of want to do it like you would in python using slice on arrays if that's possible.
Maybe like (rows 0-5116 - 245 and all columns starting from 0)
transferdata_df = train_df.iloc[5115 - 245:, 0:]
Then append that to the test set like
test_df.append(transferdata_df)
I'm not sure if this is the correct way or not.
https://stackoverflow.com/questions/65433114/pandas-dataframe-move-n-number-of-rows-from-one-dataframe-to-another December 24, 2020 at 10:00AM
没有评论:
发表评论