2021年3月10日星期三

How to fix extra comma column offset CSV using Pandas?

The CSV file looks like this:

Actual data format

id,account,company,total_emp,country,state  1,12345,ABC,100,US,CA  2,32345,CCD,inc,100,US,USA,NA     3,42345,ABC,LLC,100,US,WA  4,52345,DDM,100,CA,US,OR    5,62345,TSL,100,US,UT  ....  

Easy to view

id, account, company, total_emp, country,  state  1,   12345,   ABC,     100,        US,       CA  2,   32345,   CCD,inc  100,        US,USA,     3,   42345,   ABC,LLC  100,        US,       WA  4,   52345,   DDM,     100,        CA, US,   OR    5,   62345,   TSL,     100,        US,       UT  ....  

Things to note:

  • The column values for company do not have any quotation.
  • There are other columns after 'company'.
  • There are 2 columns with a comma issue.

Here, on id 2 and 3, we can see that the values are not in the quote but it has multiple commas. I am getting the following error: ValueError: 5 columns passed, passed data had 6 columns.

What did I do?

  • unique issue: I tried to offset the value using readlines() but since I have a column after the company column, it did not work.

Could someone please help me solve this problem?

https://stackoverflow.com/questions/66575061/how-to-fix-extra-comma-column-offset-csv-using-pandas March 11, 2021 at 09:01AM

没有评论:

发表评论