2020年12月23日星期三

Failed to download full rows using Pandas read_excel() for xlsx file

The file is supposed to have thousands number of rows. But using below it only returns the first couple of rows in dataframe

File https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx

Failed example

import pandas as pd    url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'  df = pd.read_excel(url, engine='openpyxl', header=2, usecols='A:D', verbose=True)  print(df.shape)  
# output - only 5 rows  Reading sheet 0  (5, 4)  

Working example

Same file. Downloaded it first, opened up in Excel, modifed a text and saved (didn't change format and keep xlsx) and then use read_excel() to open from file

url = 'https://www.hkex.com.hk/eng/services/trading/securities/securitieslists/ListOfSecurities.xlsx'  path = os.path.join(os.path.dirname(__file__), 'download')  wget.download(url, out=path)  file = os.path.join(path, 'ListOfSecurities.xlsx')    # open to edit and then save in Excel    df = pd.read_excel(file, engine='openpyxl', header=2, usecols='A:D', verbose=True)  print(df.shape)  
# output  Reading sheet 0  (17490, 4)  
https://stackoverflow.com/questions/65432992/failed-to-download-full-rows-using-pandas-read-excel-for-xlsx-file December 24, 2020 at 09:40AM

没有评论:

发表评论