I want to be able to index my model after having fit the model. Say I have
df <- data.frame(a = c(1,2,3), b = c(2,3,1000), country = c("Malawi", "USA","UK"))
Then, I run:
fit<-lm(a~b,data=df)
My resulting fit$model
no longer has the "country" variable, so it becomes hard to do things like
- run a regression and then remove certain countries as robustness tests.
- run a regression and then find out which countries were outliers.
I know there are 'hacks' around this like using row indices, but I frequently find myself further subsetting the original dataset, and I am afraid of keeping track of row indices.
e.g. From the example above, I see that UK is an outlier.
So, I have two options:
lm(a~b,data=fit$model[-3,]) lm(a~b,data=df[df$country!="UK",])
The second option is much clearer to me, but because summary statistics and tests in R (such as cook's distance) only give me the row index, I end up having to do the first option much more than I would like. This becomes especially tedious in large panel datasets where I'm trying to test robustness to outliers or leveraged data and would also like to know what countries (or other variables) those data are.
Ideally, I'd like an option to do something like
lm(a~b,data=fit$model[fit$model$country!="UK",])
Please help, and thank you so much!
https://stackoverflow.com/questions/65743637/how-to-keep-a-variable-in-fitmodel-for-lm-in-r-that-im-not-using-within-th January 16, 2021 at 05:16AM
没有评论:
发表评论