2021年3月30日星期二

How to join two big dataframes based on multiple conditions in an effective quickly way?

I need to add df1 columns's ['inicio_vigencia'] and ['fim_vigencia'] to df2, considering the following conditions:

  • df1['id_vei'] == df2['id_vei'] and
  • df1['inicio_vigencia'] <= df2['Hora_do_evento'] and
  • df1['Hora_do_evento'] >= df2['Hora_do_evento']
    Dataframes size:
  • df1 has 25.196.167 lines (aprox. 2gb)
  • df2 has 113.787 lines

df1df2result

I prepared the following script, but it takes lots of time.

df2['inicio_vigencia'] = 0  df2['fim_vigencia'] = 0    for i in range(len(df2)):      id_vei = (df2.iloc[i, 1])      Hora_do_evento = (df2.iloc[i, 3])      df_vigencia = df1.loc[(df1['inicio_vigencia'] <= Hora_do_evento) &                                                   (df1['fim_vigencia'] >= Hora_do_evento) &                                                   (df1['id_vei'] == id_vei)]      if (len(df_vigencia) > 0):          index = int(df_vigencia[df_vigencia['id_vei'] == id_vei].index.values)          #len_df_vigencia.append(len(df_vigencia))          inicio_vigencia = df_vigencia['inicio_vigencia'][index]          fim_vigencia = df_vigencia['fim_vigencia'][index]          df2.loc[i, 'inicio_vigencia'] = inicio_vigencia          df2.loc[i, 'fim_vigencia'] = fim_vigencia    

How could I perform this task efficiently?

https://stackoverflow.com/questions/66881493/how-to-join-two-big-dataframes-based-on-multiple-conditions-in-an-effective-quic March 31, 2021 at 12:04PM

没有评论:

发表评论