有些事如何做: Using stat_function with a regression with a factor with multiple levels

I have a data.frame with observed success/failure outcomes per three groups along with expected probabilities:

library(dplyr)  observed.probability.df <- data.frame(group = c("A","B","C"), p = c(0.5,0.2,0.3))  expected.probability.df <- data.frame(group = c("A","B","C"), p = qlogis(c(0.4,0.3,0.3)))  observed.data.df <- do.call(rbind,lapply(c("A","B","C"), function(g)    data.frame(group = g, value = c(rep(0,1000*sum(dplyr::filter(observed.probability.df, group != g)$p)),rep(1,1000*dplyr::filter(observed.probability.df, group == g)$p)))  )) %>% dplyr::left_join(expected.probability.df)    observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B","C"))  observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B","C"))

I'm fitting a logistic regression (binomial glm with a logit link function) to these data with the offset term:

fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit'))

Now, I'd like to plot these data as a bar graph using ggplot2's geom_bar, color-coded by group, and to add to that the fit's estimated trend line (see post on how to also add the shaded standard error area for group with two levels).

If it wasn't for the offset term I'd just use stat_smooth, but sems that I have to resort to alternative ways.

I was thinking of using stat_function calling this function for getting the trend line:

trendLine <- function(x, ests) plogis(ests[1] + ests[2] * x)

But I don't know how make this work such that two trend lines are added, corresponding to each contrast:

Going from group A to group B, calling trendLine with coef(fit)[c(1,2)])
Going from group A to group C, calling trendLine with coef(fit)[c(1,3)])

I tried this:

library(ggplot2)  ggplot(observed.probability.df, aes(x = group, y = p, fill = group)) +    geom_bar(stat = 'identity') +    stat_function(fun = slope.est,args=list(ests=coef(fit)[c(1,2)]),size=2,color="black") +    stat_function(fun = slope.est,args=list(ests=coef(fit)[c(1,3)]),size=2,color="black") +    scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +    theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

But it looks like the bottom line is a curve which also runs all the way to the bar of group C.

So my question is if theres a principled to add these contrast-wise trend lines to the bar plot with their respective x-axis starts and ends. It does not necessarily have to be with stat_function, but I'm not aware of an alternative.

https://stackoverflow.com/questions/66486186/using-stat-function-with-a-regression-with-a-factor-with-multiple-levels March 05, 2021 at 11:06AM

有些事如何做

2021年3月4日星期四

Using stat_function with a regression with a factor with multiple levels

没有评论:

发表评论