This is the question that I am trying to answer: 'Download monthly corporate bond yield data during Jan 2015 to Dec 2017 for all bonds in the spreadsheet from TRACE on WRDS. Note that for each bond issue that is traded at least once within a given month, the daily yield is the last yield that occurs on the trading day. The monthly yield is then calculated as the MEDIAN of daily yields in the month for the bond issue.'
I wasn't sure how to groupby by months, as the trd_exctn_dt is given in the format 'YYYY-MM-DD'. Once I have groupedby the months, I would need to select the middle observation within each group, however I'm not sure how I would achieve this?
I would be looking for the following output:
Input:
cusip_id | yld_pt | trd_exctn_dt | trd_exctn_tm |
---|---|---|---|
00077TAA2 | 6.1345 | 2015-01-05 | 57718.0 |
00077TAA2 | 5.4783 | 2015-01-12 | 56321.0 |
00077TAA2 | 4.6732 | 2015-01-25 | 65784.0 |
00077TAA2 | 6.1345 | 2015-02-06 | 54518.0 |
00077TAA2 | 4.6732 | 2015-03-31 | 63794.0 |
Output:
I would group these input by month and then select the middle observation within a particular month and my desired output would be as follows:
cusip_id | yld_pt | trd_exctn_dt | trd_exctn_tm |
---|---|---|---|
00077TAA2 | 5.4783 | 2015-01-12 | 56321.0 |
00077TAA2 | 6.1345 | 2015-02-06 | 54518.0 |
00077TAA2 | 4.6732 | 2015-03-31 | 63794.0 |
... having grouped them, the middle observation of 3 that occurred in Jan, happened on 2015-01-12, therefore, this is the monthly yield. As Feb and Mar, only had one daily yield occur within their respective months, this means that they are the monthly yields for this specific 'cusip_id'. Continue this process for each of the months where a daily yield occurs.
So far I have used the following code:
stmt = """ SELECT cusip_id, yld_pt, TRD_EXCTN_DT,TRD_EXCTN_TM FROM trace.trace WHERE cusip_id IN {} AND (TRD_EXCTN_DT BETWEEN '2015-01-01' AND '2017-12-31') """.format(tuple(df.cusip_id)) conn.raw_sql(stmt) df2 = conn.raw_sql(stmt) df2.sort_values(by=['cusip_id','trd_exctn_dt','trd_exctn_tm'], ascending = [True, True, False], inplace = True) df2.drop_duplicates(subset = ['cusip_id','trd_exctn_dt','trd_exctn_tm'], keep = 'first', inplace = True) df2.sort_values(by=['trd_exctn_dt'], ascending = True, inplace = True)
https://stackoverflow.com/questions/65417106/groupby-daily-yields-based-on-their-months-and-find-the-middle-observation-of-ea December 23, 2020 at 07:22AM
没有评论:
发表评论