2021年4月22日星期四

From specific row in a df_a, to count its occrueances in past a year in df_b

2 dataframe as below, I want to return, how many Success (Yes) in a year (of the specific person) 1 year before his/her specific date, i.e. each entry in "to check" to define the range in "history".

For example, in "to_check", Mike 20200602, I want to know how many Success (Yes) in Mike's history (1 year before, until 20200602).

enter image description here

By using the "to_check" as a list, I came up with a clumsy way:

import pandas as pd  import datetime  import numpy as np  from io import StringIO  import time  from datetime import datetime, date, time, timedelta      csvfile = StringIO("""  Name Check  Mike 20200602  David 20210415  Kate 20201109""")    csvfile_1 = StringIO("""  Name History Success  David 20180312 Yes  David 20180811 Yes  David 20191223 Yes  David 20210311 Yes  Kate 20180906 Yes  Kate 20180912 Yes  Kate 20191204 Yes  Kate 20200505 Yes  Mike 20180912 Yes  Mike 20190312 Yes  Mike 20190806 Yes  Mike 20191204 Yes""")      df_check = pd.read_csv(csvfile, sep = ' ', engine='python')  df_history = pd.read_csv(csvfile_1, sep = ' ', engine='python')    df_history['Date'] = pd.to_datetime(df_history['History'], format='%Y%m%d')    to_check = [Mike 20200602"",""David 20210415"",""Kate 20201109""]    for t in to_check:      name, d = t.split("" "")      date_obj = datetime.strptime(d, '%Y%m%d')      delta = timedelta(days = 365)      day_before = date_obj - delta      m1 = df_history['Name'] == name      m2 = df_history['Date'] >= day_before        df_history['OP'] = np.where(m1 & m2, ""fit"", '')        how_many = df_history['OP'].value_counts().tolist()[1]        print (t, how_many)  

Output:

Mike 20200602 2  David 20210415 1  Kate 20201109 2  

What's the better and smarter way to achieve it? Thank you.

https://stackoverflow.com/questions/67223048/from-specific-row-in-a-df-a-to-count-its-occrueances-in-past-a-year-in-df-b April 23, 2021 at 10:05AM

没有评论:

发表评论