Hi I have a data frame which has ID and topic_number (top) as shown below:
structure(list(ID = c(1013515, 1013515, 1013515, 1013515, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446, 1018446), top = c(2L, 8L, 8L, 4L, 2L, 7L, 3L, 7L, 7L, 1L, 8L, 7L, 1L, 2L, 3L, 7L, 1L, 2L, 1L, 1L, 8L, 7L, 7L, 1L, 1L, 2L, 2L, 1L, 7L, 3L, 2L, 2L, 7L, 3L, 2L, 3L, 3L, 7L, 7L, 7L, 2L, 2L, 3L, 2L, 7L, 8L, 1L, 7L, 2L, 7L, 8L, 7L, 4L, 7L, 2L, 7L, 8L, 2L, 2L, 7L, 2L, 4L, 3L, 7L, 7L, 7L, 2L, 3L, 7L, 2L, 1L, 7L, 3L, 7L, 3L, 2L, 7L, 4L, 8L, 2L, 2L, 2L, 7L, 3L)), row.names = c(NA, -84L), class = c("tbl_df", "tbl", "data.frame"))
I know that I have 8 topics in general and what I want to do is find that for each ID, what percentage is for which topic and I want to show this in column format. So I want to see 8 new columns generated and in each column the ratio is shown, so for example:
ID total topic_1 topic_2 topic_3 topic_4 topic_5 topic_6 topic_7 topic_8 1013515 4 0 0.25 0 0.25 0 0 0 0.5 1018446 80 0.125 0.275 0.15 0.037 0 0 0.337 0.075
I started with:
data <- dd %>% group_by(ID) %>% count(top)
but don't know how to change this to ratio and generate the columns
https://stackoverflow.com/questions/66947451/finding-a-ratio-of-variable-using-group-by-and-showing-it-in-dataframe-format April 05, 2021 at 09:56AM
没有评论:
发表评论