2021年5月1日星期六

tidyr separate(): speed up when many entries are the same?

I have a very large tibble with a column of concatenated variables. Since this is repeated measure data, there are many repetitions of the same few combinations of concatenated variables, and as a result, this code:

df %>% group_by(var_col) %>% nest() %>% separate(var_col, into=c("var1","var2"),sep="_") %>% unnest(data)  

is many times faster than this code:

df %>% separate(var_col, into=c("var1","var2"),sep="_")  

This seems a bit hackish a way to get so much speed-up. Is there a better way to take advantage of the fact that my data is repeated like this?

https://stackoverflow.com/questions/67352394/tidyr-separate-speed-up-when-many-entries-are-the-same May 02, 2021 at 10:05AM

没有评论:

发表评论