2021年3月2日星期二

How to create multiple new columns based of off groups of columns that start with a certain prefix and also contain a certain string?

I have data that look like this

df <- data.frame(ID = c(1,2,3,4,5,6),                   var1_unmod = c (1,0,0,1,0,1),                    var1_me1 = c(0,1,0,0,0,0),                   var1_me2 = c(1,1,1,0,1,0),                    var1_me3 = c(0,0,1,0,0,0),                   var1_ac1 = c(1,0,1,1,0,1),                   var2_unmod = c(1,0,1,1,0,0),                   var2_me1 = c(0,0,0,0,1,0),                    var2_me2 = c(1,1,0,1,1,1),                    var2_ac1 = c(1,1,0,1,0,0),                    var2_me1ac1 = c(1,0,0,0,0,0),                    var2_me2ac1 = c(1,0,0,1,1,1))      ID var1_unmod var1_me1 var1_me2 var1_me3 var1_ac1 var2_unmod var2_me1 var2_me2 var2_ac1 var2_me1ac1 var2_me2ac1  1  1          1        0        1        0        1          1        0        1        1           1           1  2  2          0        1        1        0        0          0        0        1        1           0           0  3  3          0        0        1        1        1          1        0        0        0           0           0  4  4          1        0        0        0        1          1        0        1        1           0           1  5  5          0        0        1        0        0          0        1        1        0           0           1  6  6          1        0        0        0        1          0        0        1        0           0           1  

except that in the actual dataset, the prefixes aren't sequential like var1 and var2, they are basically random combinations of letters and numbers, and there are about 30 different ones.

For each of these prefixes (var1, var2, ...), I need to create a single variable that indicates whether any of the columns with that prefix that also contain me1, me2, or me3 (so for var2 this would be var2_me1, var2_me2, var2_me1ac1, var2_me2ac1) are nonzero. The output dataset would have additional columns like this:

  ID var1_unmod var1_me1 var1_me2 var1_me3 var1_ac1 var1_meX var2_unmod var2_me1 var2_me2 var2_ac1 var2_me1ac1 var2_me2ac1 var2_meX  1  1          1        0        1        0        1        1          1        0        1        1           1           1        1  2  2          0        1        1        0        0        1          0        0        1        1           0           0        1  3  3          0        0        1        1        1        1          1        0        0        0           0           0        0  4  4          1        0        0        0        1        0          1        0        1        1           0           1        1  5  5          0        0        1        0        0        1          0        1        1        0           0           1        1  6  6          1        0        0        0        1        0          0        0        1        0           0           1        1  

First I need to identify the applicable columns for each prefix (because there is no pattern to the prefixes, I'm thinking I will have to hard code at least this part), and then maybe somehow write a loop that iterates through the columns (stored in a vector?) for each prefix. I tend to have trouble referencing varying column names within loops. Any help is appreciated!

https://stackoverflow.com/questions/66441772/how-to-create-multiple-new-columns-based-of-off-groups-of-columns-that-start-wit March 02, 2021 at 10:59PM

没有评论:

发表评论