2021年3月5日星期五

Read only specific timestamp (multiple rows) from Parquet file in Python Pandas?

I have a Pandas dataframe that looks similar to this:

datetime                 data1 data2  2021-01-23 00:00:31.140     a1    a2  2021-01-23 00:00:31.140     b1    b2          2021-01-23 00:00:31.140     c1    c2  2021-01-23 00:01:29.021     d1    d2  2021-01-23 00:02:10.540     e1    e2  2021-01-23 00:02:10.540     f1    f2  

The real dataframe is very large and for each unique timestamp, there are a few thousand rows.

I want to save this dataframe to a Parquet file so that I can quickly read all the rows that have a specific datetime index, without loading the whole file or looping through it. How do I save it correctly in Python and how do I quickly read only the rows for one specific datetime?

After reading, I would like to have a new dataframe that contains all the rows for that specific datetime. For example, I want to read only the rows for datetime "2021-01-23 00:00:31.140" from the Parquet file and receive this dataframe:

datetime                 data1 data2  2021-01-23 00:00:31.140     a1    a2  2021-01-23 00:00:31.140     b1    b2          2021-01-23 00:00:31.140     c1    c2  

I appreciate any help, thank you very much in advance!

https://stackoverflow.com/questions/66502174/read-only-specific-timestamp-multiple-rows-from-parquet-file-in-python-pandas March 06, 2021 at 11:47AM

没有评论:

发表评论