2021年4月23日星期五

Multiprocessing not working for same function

I have a heavy process I want to use Multiprocessing for and execute simultaneously. However, when I run it using multiprocessing -- the processes are spawned but they are not using any system resources to execute. I looked up a multiprocessing example shown on GeeksForGeeks

Expected Results : Running Multiprocessing on the dataset using multiprocess_dataset() function given below

Code:

import Extractor  import DatasetReader    import gc  import pandas as pd    def batched_extraction(sub_data_df, file_batch_name, COL_NAME):            print('File {} started processing!'.format(file_batch_name))            result = [Extractor(row[COL_NAME]).get_all_phrases() for idx, row in sub_data_df.iterrows()]      sub_data_df['result'] = result            sub_data_df.to_csv('Outputs/file_batch_name', index=False, encoding='utf-8-sig')            del phrases, sub_data_df      gc.collect()            def multiprocess_dataset(formatted_df):            batched_num_reviews = len(formatted_df)//4            print('Multicore process 4 batches of {} reviews each'.format(batched_num_reviews))            sub_df_1, sub_df_2 = formatted_df[:batched_num_reviews], formatted_df[batched_num_reviews:2*(batched_num_reviews)]      sub_df_3, sub_df_4 = formatted_df[2*(batched_num_reviews):3*(batched_num_reviews)], formatted_df[3*(batched_num_reviews):]            file_batch_name_1 = 'output_multiprocess_input_file_b1_{}_reviews.csv'.format(len(sub_df_1))      file_batch_name_2 = 'output_multiprocess_input_file_b2_{}_reviews.csv'.format(len(sub_df_2))      file_batch_name_3 = 'output_multiprocess_input_file_b3_{}_reviews.csv'.format(len(sub_df_3))      file_batch_name_4 = 'output_multiprocess_input_file_b4_{}_reviews.csv'.format(len(sub_df_4))            p1 = multiprocessing.Process(name="process1", target=batched_extraction, args = (sub_df_1, file_batch_name_1, COL_NAME))      p2 = multiprocessing.Process(name="process2", target=batched_extraction, args = (sub_df_2, file_batch_name_2, COL_NAME))      p3 = multiprocessing.Process(name="process3", target=batched_extraction, args = (sub_df_3, file_batch_name_3, COL_NAME))      p4 = multiprocessing.Process(name="process4", target=batched_extraction, args = (sub_df_4, file_batch_name_4, COL_NAME))            p1.start()      p2.start()      p3.start()      p4.start()            p1.join()      p2.join()      p3.join()      p4.join()    def main():            INPUT_FILENAME = input('Enter input filename : ')      COL_NAME = input('Enter column on which you want to process on : ')        df = DatasetReader('Datasets/{}'.format(INPUT_FILENAME)).read_dataset()      df[COL_NAME].replace(np.nan, "EMPTY")        #    num_reviews = len(df)  #    print('Running process for {} reviews'.format(num_reviews))        multiprocess_dataset(formatted_df = df)    if __name__ == '__main__':      main()  

Actual Result / Stack Trace at running the program:

(phrase) viole@viole-X510UNR:~/Documents$ python3.6 program.py  /home/viole/Documents/phrase/lib/python3.6/site-packages/allennlp/service/predictors/__init__.py:23: FutureWarning: allennlp.service.predictors.* has been depreciated. Please use allennlp.predictors.*    "Please use allennlp.predictors.*", FutureWarning)  /home/viole/Documents/phrase/lib/python3.6/site-packages/torch/nn/modules/container.py:434: UserWarning: Setting attributes on ParameterList is not supported.    warnings.warn("Setting attributes on ParameterList is not supported.")  Enter input filename : accomm_dataset.csv  Enter column on which you want to process on : comments  Multicore process 4 batches of 6386 reviews each  File output_multiprocess_input_file_b1_6386_reviews.csv started processing!  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10205.12it/s]    0%|                                                                                                | 0/1 [00:00<?, ?it/s]File output_multiprocess_input_file_b2_6386_reviews.csv started processing!  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10255.02it/s]    0%|                                                                                                | 0/1 [00:00<?, ?it/s]File output_multiprocess_input_file_b3_6386_reviews.csv started processing!  <---Expanding Contraction--->  100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 60.89it/s]  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25115.59it/s]  100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3446.43it/s]  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13706.88it/s]  File output_multiprocess_input_file_b4_6386_reviews.csv started processing!  100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 59.78it/s]  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 26546.23it/s]    0%|                                                                                                | 0/1 [00:00<?, ?it/s]<---Expanding Contraction--->  100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3440.77it/s]    0%|                                                                                                | 0/1 [00:00<?, ?it/s]Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.    0%|                                                                                                | 0/1 [00:00<?, ?it/s]Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.  100%|███████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 171.89it/s]  100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 44.97it/s]  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]  100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2983.15it/s]  100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 60.82it/s]  <---Expanding Contraction--->  100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 19508.39it/s]  100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2906.66it/s]    0%|                                                                                                | 0/1 [00:00<?, ?it/s]Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.  Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.    

I checked my System Monitor to see resources being used -- all cores are lying idle. I tried running Extractor without multiprocessing and it seemed to work alright. Is there anything I am missing here?

Any help would be appreciated!

https://stackoverflow.com/questions/67238443/multiprocessing-not-working-for-same-function April 24, 2021 at 09:04AM

没有评论:

发表评论