I have the following code which I use to loop through row groups in a parquet metadata file to find the maximum values for columns i,j,k across the whole file. As far as I know I have to find the max value in each row group. I want to know if there is a better way to write this code in less lines without it being 5 levels deeply nested so I can improve my code readability.
I tried to use a dictionary lambda combo as a switch statement in place of some of the if statements, and eliminate at least 2 levels of nesting, but I couldn't figure out how to do the greater than evaluation without nesting further.
*Update - it's been pointed out that code style is subjective which is a fair point. I guess the main thing I was looking for was: a) how to write it with less levels of nesting and b) in fewer lines. I was hoping those 2 things would potentially make the code more clear. The comment about using a switch statement was initially in my opinion something that would make the code more clear but may not be the case.
import pyarrow.parquet as pq def main(): metafile = r'D:\my_parquet_meta_file.metadata' meta = pq.read_metadata(metafile) max_i = 0 max_j = 0 max_k = 0 for grp in range(0, meta.num_row_groups): for col in range(0, meta.num_columns): # locate columns i,j,k if meta.row_group(grp).column(col).path_in_schema in ['i', 'j', 'k']: if meta.row_group(grp).column(col).path_in_schema == 'i': if meta.row_group(grp).column(col).statistics.max > max_i: max_i = meta.row_group(grp).column(col).statistics.max if meta.row_group(grp).column(col).path_in_schema == 'j': if meta.row_group(grp).column(col).statistics.max > max_j: max_j = meta.row_group(grp).column(col).statistics.max if meta.row_group(grp).column(col).path_in_schema == 'k': if meta.row_group(grp).column(col).statistics.max > max_k: max_k = meta.row_group(grp).column(col).statistics.max print('max i: ' + str(max_i), 'max j: ' + str(max_j), 'max k: ' + str(max_k)) if __name__ == '__main__': main() All suggestions would be greatly appreciated. I'm sure there's a better way to do this.
https://stackoverflow.com/questions/65587980/how-do-i-write-this-nested-python-code-more-efficiently-in-the-absence-of-switch January 06, 2021 at 07:22AM
没有评论:
发表评论