2021年2月7日星期日

Group rows where row in variable x equals row in vari

I'm working with large administrative datasets, which the patients are linked to an index number using personal identifiers. There are cases where the personal identifiers are matched to two indexes, but with varying degrees of confidence. I have a dataset of over 1 millions Index numbers but I've created a small example data.

data.frame(Index = c(1,2,3,4,5,6,7,8,9,10),              Duplicate = c(0,1,0,1,0,0,0,0,1,1),               Duplicate_with = c(NA, 10, NA, 9, NA, NA, NA, NA, 4, 2),              Grade = c("A", "A", "A", "B", "A", "A", "A", "A", "C", "B")) -> data  

'Index' is the Index number of the patient, 'Duplicate' highlights if the Index is a duplicate (0=No; 1=Yes), 'Duplicate_with' gives the Index for which it was a duplicate to, and 'Grade' is the confidence of a correct match using the personal identifiers.

I would like to group together the rows where Index in row x equals Duplicate_with in row y to end up with

     Index Duplicate Duplicate_with Grade Group  1      1         0             NA     A     1  2      2         1             10     A     2  3      3         0             NA     A     3  4      4         1              9     B     4  5      5         0             NA     A     5  6      6         0             NA     A     6  7      7         0             NA     A     7  8      8         0             NA     A     8  9      9         1              4     C     4  10     10        1              2     B     2  

So for example, if Index==1, search in Duplicate_with for 1, and then group. If Index==2, then search in Duplicate_with for 2, and then group.

Could anyone please give me some advice on coding this and I've ran out of ideas?

My apologies if my question is not clear, or could be improved, this is my first time posting so I will also take any tips on improving the question.

https://stackoverflow.com/questions/66094486/group-rows-where-row-in-variable-x-equals-row-in-vari February 08, 2021 at 08:17AM

没有评论:

发表评论