The CSV
file looks like this:
Actual data format
id,account,company,total_emp,country,state 1,12345,ABC,100,US,CA 2,32345,CCD,inc,100,US,USA,NA 3,42345,ABC,LLC,100,US,WA 4,52345,DDM,100,CA,US,OR 5,62345,TSL,100,US,UT ....
Easy to view
id, account, company, total_emp, country, state 1, 12345, ABC, 100, US, CA 2, 32345, CCD,inc 100, US,USA, 3, 42345, ABC,LLC 100, US, WA 4, 52345, DDM, 100, CA, US, OR 5, 62345, TSL, 100, US, UT ....
Things to note:
- The column values for
company
do not have any quotation. - There are other columns after 'company'.
- There are 2 columns with a comma issue.
Here, on id
2
and 3
, we can see that the values are not in the quote but it has multiple commas. I am getting the following error: ValueError: 5 columns passed, passed data had 6 columns
.
What did I do?
-
unique issue
: I tried to offset the value usingreadlines()
but since I have a column after thecompany
column, it did not work.
Could someone please help me solve this problem?
https://stackoverflow.com/questions/66575061/how-to-fix-extra-comma-column-offset-csv-using-pandas March 11, 2021 at 09:01AM
没有评论:
发表评论