有些事如何做: groupby daily yields based on their months and find the middle observation of each group?

This is the question that I am trying to answer: 'Download monthly corporate bond yield data during Jan 2015 to Dec 2017 for all bonds in the spreadsheet from TRACE on WRDS. Note that for each bond issue that is traded at least once within a given month, the daily yield is the last yield that occurs on the trading day. The monthly yield is then calculated as the MEDIAN of daily yields in the month for the bond issue.'

I wasn't sure how to groupby by months, as the trd_exctn_dt is given in the format 'YYYY-MM-DD'. Once I have groupedby the months, I would need to select the middle observation within each group, however I'm not sure how I would achieve this?

I would be looking for the following output:

Input:

cusip_id	yld_pt	trd_exctn_dt	trd_exctn_tm
00077TAA2	6.1345	2015-01-05	57718.0
00077TAA2	5.4783	2015-01-12	56321.0
00077TAA2	4.6732	2015-01-25	65784.0
00077TAA2	6.1345	2015-02-06	54518.0
00077TAA2	4.6732	2015-03-31	63794.0

Output:

I would group these input by month and then select the middle observation within a particular month and my desired output would be as follows:

cusip_id	yld_pt	trd_exctn_dt	trd_exctn_tm
00077TAA2	5.4783	2015-01-12	56321.0
00077TAA2	6.1345	2015-02-06	54518.0
00077TAA2	4.6732	2015-03-31	63794.0

... having grouped them, the middle observation of 3 that occurred in Jan, happened on 2015-01-12, therefore, this is the monthly yield. As Feb and Mar, only had one daily yield occur within their respective months, this means that they are the monthly yields for this specific 'cusip_id'. Continue this process for each of the months where a daily yield occurs.

So far I have used the following code:

stmt = """    SELECT cusip_id, yld_pt, TRD_EXCTN_DT,TRD_EXCTN_TM    FROM trace.trace    WHERE cusip_id IN {}    AND (TRD_EXCTN_DT BETWEEN '2015-01-01' AND '2017-12-31')    """.format(tuple(df.cusip_id))    conn.raw_sql(stmt)      df2 = conn.raw_sql(stmt)    df2.sort_values(by=['cusip_id','trd_exctn_dt','trd_exctn_tm'], ascending = [True, True, False], inplace = True)  df2.drop_duplicates(subset = ['cusip_id','trd_exctn_dt','trd_exctn_tm'], keep = 'first', inplace = True)    df2.sort_values(by=['trd_exctn_dt'], ascending = True, inplace = True)

https://stackoverflow.com/questions/65417106/groupby-daily-yields-based-on-their-months-and-find-the-middle-observation-of-ea December 23, 2020 at 07:22AM

有些事如何做

2020年12月22日星期二

groupby daily yields based on their months and find the middle observation of each group?

没有评论:

发表评论