2021年1月20日星期三

Holdout results using lm in caret

I'm comparing different resampling methods in caret, using only LM. Across multiple datasets and seeds, I'm seeing much better model performance for k-folds, which concerns me that I'm pulling the correct information from the fit object. I wan to know with certainty how to recover holdout model performance when using repeatedcv. How do you recover holdout fold model performance using lm with caret?

In this example below, both boot and LOOCV produce worse model performance using the iris dataset. Given LOOCV uses more data each train, this doesn't make sense to me:

  fit <- train(Sepal.Width ~ ., method = "lm", data = iris, trControl = trainControl(method = "repeatedcv", number=10, repeats=10))  fit    fit <- train(Sepal.Width ~ ., method = "lm", data = iris, trControl = trainControl(method = "LOOCV"))  fit    fit <- train(Sepal.Width ~ ., method = "lm", data = iris, trControl = trainControl(method = "boot", number=1000))  fit  

Later, I ran a manual k-folds (non-repeated). This consistently results in worse performance than caret k-folds but similar to LOOCV and boot. I didn't set seeds but you can re-run a few times and R^2 will consistently be lower with the manual method. It is unclear why caret is different.

#create folds#  iris <-iris[sample(nrow(iris)),]  folds <- cut(seq(1,nrow(iris)),breaks=10,labels=FALSE)    results <- data.frame(matrix(NA, nrow = 0, ncol = 1))  #store results  #Perform 10 fold cross validation  for(i in 1:10){    testIndexes <- which(folds==i,arr.ind=TRUE)    testData <- iris[testIndexes, ]    trainData <- iris[-testIndexes, ]    print (nrow(trainData))    print (nrow(testData))        OLS <- lm (Sepal.Width ~ ., data=trainData)    Predicted <- as.data.frame (predict (OLS, newdata = testData))    results <- rbind (results, corr.test(cbind (dplyr::select(testData, Sepal.Width), Predicted))$r[2,1])  }  mean (results[,1])^2  
https://stackoverflow.com/questions/65815873/holdout-results-using-lm-in-caret January 21, 2021 at 02:57AM

没有评论:

发表评论