2021年3月30日星期二

How to create a look-up table from a dataframe of changes?

I want to create a look-up table from a dataframe of changes. Each row of the original dataframe signifies a change in the coding for a given district. That dataset covers some time period, 2009 to 2019. While a district may experience several changes during that time period, I want the 2009 and 2019 codings of each district. That is, the first and latest coding.

The dataframe covers hundreds of districts. Some district may go through just a single change, while others go through multiple. A district can be merged or split into multiple others.

The ideal look-up table would look as follows:

coding_2009 coding_2019
00QR S12000047
00QR S12000048
00RB S12000047
00RB S12000048

coding_2009 is the district's coding as of 2009, and coding_2019 is its latest coding as of 2019.

The original dataframe (a subset), where each row shows a change, looks like:

past new date
00QR S12000015 2009-01-01
S12000015 S12000047 2018-02-02
S12000015 S12000048 2018-02-02
00RB S12000015 2009-01-01
S12000024 S12000047 2018-02-02
S12000024 S12000048 2018-02-02

For each row, past is the code that is recoded into new as of date.

For example, district 00QR is turned into S12000015, which is later split into S12000047 and S12000048.

I've been dealing with this problem for weeks now, trying out different ad hoc versions but none seem to consistently work out. Please note that the code needs to consider that some districts experience just one change, whereas others may experience two or more changes. Districts can also be split or merged, as the example shows.

The ideal answer would use tidyverse.

For a reprex, I have subset a selection of districts below.

Thanks for your help! It would be appreciated immensely.

Reprex data: (you can also go above and beyond and use the original dataset, Changes.csv. See link below)

# Library tibble (a part of tidyverse) is needed to copy paste reprex data  #install.packages("tibble") # if you need to install it  library(tibble)    data <- tibble::tribble(          ~past,        ~new,        ~date,         "00RJ", "S12000013", "2009-01-01",         "00QR", "S12000015", "2009-01-01",         "00RB", "S12000024", "2009-01-01",         "13UD", "E07000015", "2009-01-01",         "15UH", "E07000025", "2009-01-01",         "00HC", "E06000024", "2009-01-01",         "00KG", "E06000034", "2009-01-01",         "19UD", "E07000049", "2009-01-01",         "19UE", "E07000050", "2009-01-01",         "19UG", "E07000051", "2009-01-01",         "19UH", "E07000052", "2009-01-01",         "19UJ", "E07000053", "2009-01-01",    "E07000017", "E06000049", "2009-04-01",    "E07000025", "E06000053", "2009-04-01",    "E07000014", "E06000049", "2009-04-01",    "E07000015", "E06000049", "2009-04-01",    "S12000013", "S12000013", "2015-06-16",    "S12000013", "S12000013", "2015-11-01",    "S12000015", "S12000047", "2018-02-02",    "S12000024", "S12000047", "2018-02-02",    "S12000015", "S12000048", "2018-02-02",    "S12000024", "S12000048", "2018-02-02",    "E07000049", "E06000059", "2019-04-01",    "E07000050", "E06000059", "2019-04-01",    "E07000053", "E06000059", "2019-04-01",    "E07000051", "E06000059", "2019-04-01",    "E07000052", "E06000059", "2019-04-01"    )    # Convert date to Date (after being copy pasted as tibble)  data$date <- as.Date(data$date)  

For anyone interested, this data is from the UK's Code History Database. You can download the zip from the link below. It's the file named Changes.csv: https://geoportal.statistics.gov.uk/datasets/code-history-database-december-2019-for-the-united-kingdom. Note, in Changes.csv, past is named geogcd_p, new is geogcd and dateis oper_date.

https://stackoverflow.com/questions/66879816/how-to-create-a-look-up-table-from-a-dataframe-of-changes March 31, 2021 at 07:43AM

没有评论:

发表评论