A similar question (Workflow for statistical analysis and report writing) has been asked on StackOverflow about structuring the R code, however, it does not addresses the problem I am facing. I am sure someone must have solved this problem smartly. I have a large dataset on which I need to perform the following tasks. I am putting file names in front of these tasks and need a better way to organize my code -
1. 01_load_data.R - data loading #------------------------------------------------------------------ 2. 02_explore.R - data exploration with charts and tables 3. Run various classification and regression algorithms 3a. 03_glm.R 3b. 03_chisq.R 3c. 03_rf.R 3d. 03_knn.R and many more... 4. 04_clean_up - Slicing and dicing the data based on learnings in 3a, 3b, 3c etc... and it results in a subset of original data. On this subset, the loop begins again from point 2. #------------------------------------------------------------------ 5. 12_explore.R - Re-run the explore data with charts and tables 6. Run various classification and regression algorithms 6a. 13_glm.R 6b. 13_chisq.R 6c. 13_rf.R 6d. 13_knn.R 7. 14_clean_up - Slicing and dicing the data based on learnings in 6a, 6b, 6c etc... and it results in a subset of data in point 4. On this subset, the loop begins again. #------------------------------------------------------------------ ... And the iterations go on... ... #------------------------------------------------------------------ If I am running 20 different tests/ models in point 3, point 6, and following iterations, then the whole project becomes unmanageable at one point.
Is there any better workflow to arrange this code?
Thanks!
https://stackoverflow.com/questions/66056540/is-there-a-better-way-to-structure-code-in-r February 05, 2021 at 09:56AM
没有评论:
发表评论