2021年1月6日星期三

How do I write this nested python code more efficiently in the absence of switch statements

I have the following code which I use to loop through row groups in a parquet metadata file to find the maximum values for columns i,j,k across the whole file. As far as I know I have to find the max value in each row group. I want to know if there is a better way to write this code in less lines without it being 5 levels deeply nested so I can improve my code readability.

I tried to use a dictionary lambda combo as a switch statement in place of some of the if statements, and eliminate at least 2 levels of nesting, but I couldn't figure out how to do the greater than evaluation without nesting further.

*Update - it's been pointed out that code style is subjective which is a fair point. I guess the main thing I was looking for was: a) how to write it with less levels of nesting and b) in fewer lines. I was hoping those 2 things would potentially make the code more clear. The comment about using a switch statement was initially in my opinion something that would make the code more clear but may not be the case.

import pyarrow.parquet as pq      def main():      metafile = r'D:\my_parquet_meta_file.metadata'      meta = pq.read_metadata(metafile)        max_i = 0      max_j = 0      max_k = 0        for grp in range(0, meta.num_row_groups):          for col in range(0, meta.num_columns):              # locate columns i,j,k              if meta.row_group(grp).column(col).path_in_schema in ['i', 'j', 'k']:                  if meta.row_group(grp).column(col).path_in_schema == 'i':                      if meta.row_group(grp).column(col).statistics.max > max_i:                          max_i = meta.row_group(grp).column(col).statistics.max                  if meta.row_group(grp).column(col).path_in_schema == 'j':                      if meta.row_group(grp).column(col).statistics.max > max_j:                          max_j = meta.row_group(grp).column(col).statistics.max                  if meta.row_group(grp).column(col).path_in_schema == 'k':                      if meta.row_group(grp).column(col).statistics.max > max_k:                          max_k = meta.row_group(grp).column(col).statistics.max        print('max i: ' + str(max_i), 'max j: ' + str(max_j), 'max k: ' + str(max_k))      if __name__ == '__main__':      main()  

All suggestions would be greatly appreciated. I'm sure there's a better way to do this.

https://stackoverflow.com/questions/65587980/how-do-i-write-this-nested-python-code-more-efficiently-in-the-absence-of-switch January 06, 2021 at 07:22AM

没有评论:

发表评论