2021年3月12日星期五

How to select all the rows in Pandas data set with NAN and then convert it into a new column?

I have this data set of titles of books that were read across the past year and rated personally by previous students. But the person who wrote the data the past year did it their way. All I want to change is it to add a Date Column and paste the date it was read but there is the NAN problem.

Tittle         Page        Author    Rating  

0 Monday, Mar. 1 NaN NaN NaN 1 Tittle 1 5.0 JHK 1.50 2 Tittle 2 13.0 ABB 0.03 3 Tittle 3 100.0 ACC 3.5 4 Tittle 4 9.0 NN 5.40 5 Tuesday, Jan. 2 NaN NaN NaN 6 Tittle 5 6.0 BBB 6.50 7 Tittle 7 14.0 CCC 10.00 8 Tittle 8 10.0 CNN 2.50 9 Wednesday, Dec. 3NaN NaN NaN 10 Tittle 10 5.0 CBS 1.00 11 Title 20 5.0 ABC 1.00 12 Title 21 25.0 JJJ 3.50 13 Title 22 1.0 NNN 7.50 14 Thursday, Mar. 4NaN NaN NaN 15 Title 25 100.0 VVV 9.00 16 Title 30 6.0 YYYY 9.00 17 Title 35 2.0 QQQ 9.00

I have tried using dropna() but in the end it just droping the whole row and ends up deleting the date.

dfs = pd.read_csv('Book2.csv')  df = dfs.dropna()  display(df)      Tittle  Page    Author  Rating  

1 Tittle 1 5.0 JHK 1.50 2 Tittle 2 13.0 ABB 0.03 4 Tittle 4 9.0 tvN 5.40 6 Tittle 5 6.0 BBB 6.50 7 Tittle 7 14.0 CCC 10.00 8 Tittle 8 10.0 CNN 2.50 10 Tittle 10 5.0 CBS 1.00 11 Title 20 5.0 ABC 1.00 12 Title 21 25.0 JJJ 3.50 13 Title 22 1.0 NNN 7.50 15 Title 25 100.0 VVV 9.00 16 Title 30 6.0 YYYY 9.00 17 Title 35 2.0 QQQ 9.00

I had tried to use pd.isna() and make a new data-frame but its not turning out how I want it too look. It just show what is true or false. Other it just shows the date and all the rest of the data is gone.

dfs = pd.read_csv('Book2.csv')  df = dfs[dfs.isnull().any(axis=1)]  display(df)    Tittle  Page    Author  Rating  

0 Monday, Mar. 1 NaN NaN NaN 5 Tuesday, Jan. 2 NaN NaN NaN 9 Wednesday, Dec. 3 NaN NaN NaN 14 Thursday, Mar. 4 NaN NaN NaN

In the end I just edited it manually in excel to make it look the way i want. I want it to look like this.

Tittle  Page    Author  Rating  Date  

0 Tittle 1 5 JHK 1.50 Monday, Mar. 1 1 Tittle 2 13 ABB 0.03 Monday, Mar. 1 2 Tittle 3 100 ACC 4.50 Monday, Mar. 1 3 Tittle 4 9 tvN 5.40 Monday, Mar. 1 4 Tittle 5 6 BBB 6.50 Tuesday, Jan. 2 5 Tittle 7 14 CCC 10.00 Tuesday, Jan. 2 6 Tittle 8 10 CNN 2.50 Tuesday, Jan. 2 7 Tittle 10 5 CBS 1.00 Wednesday, Dec. 3 8 Title 20 5 ABC 1.00 Wednesday, Dec. 3 9 Title 21 25 JJJ 3.50 Wednesday, Dec. 3 10 Title 22 1 NNN 7.50 Wednesday, Dec. 3 11 Title 25 100 VVV 9.00 Thursday, Mar. 4 12 Title 30 6 YYYY 9.00 Thursday, Mar. 4 13 Title 35 2 QQQ 9.00 Thursday, Mar. 4

So I could sort or groupby the data by date later on if i want to see if there is a trend on the students reading habits in the future but for now I just want to add another date column and use the date that is all ready there in the data set if possible. I have to comb through a total of 181 pages of this and I am hoping pandas could help cut down the hours I will have to spend editing this manually in excel and using the copy and paste.

if you have any other recommendation in how to efficiently wrangle this data-set where the title will not mix in the same column and the NAN will be taken cared off. It will be greatly appreciated.

https://stackoverflow.com/questions/66607141/how-to-select-all-the-rows-in-pandas-data-set-with-nan-and-then-convert-it-into March 13, 2021 at 04:59AM

没有评论:

发表评论