2021年3月24日星期三

how to convert column values to str when reading multi-sheet xlsx using pd.read_excel?

I have a muti-sheet xlsx file which I want to process selected pages and finally save them as CSV.

This is a snapshot of a few raws from one page:

enter image description here

I use this code to load all pages and process each one-by-one:

def load_raw_excel_file(file_full_name):        df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0)      sheets_name = list(df.keys())        return df, sheets_name  

The output of the code (from the same page) looks like this:

dfs, shs =  load_raw_excel_file("myexelfile.xlsx")  dfs['myselectedsheetname']  

enter image description here

As you can see, some values from the Contract column have changed to date, but I don't want any changes. I've tried using convertors and dtype in pd.read_excel, but it didn't work:

df = pd.read_excel(file_full_name, sheet_name=None, engine="openpyxl", header=0, dtype=str)  

or

df = pd.read_excel("myexelfile.xlsx", sheet_name='selectedsheetname', header=0, converters={'Contract':str})  

any idea?

https://stackoverflow.com/questions/66791588/how-to-convert-column-values-to-str-when-reading-multi-sheet-xlsx-using-pd-read March 25, 2021 at 09:00AM

没有评论:

发表评论