2021年2月7日星期日

R function, update to fit more situation

I have series text files. each for them has similar 3 rows, see example below:

The probability of being a carrier is 0.07457166  an BRCA1 carrier 0.03181885  an BRCA2 carrier 0.04273394  

I need to get the last number after the specific string; I got a code with function like this:

dir <- 'W:/project/_help/temp/'  files <- list.files(dir,pattern = '*.txt')  filepath <- list.files(dir,pattern = '*.txt', full.names = TRUE)  try <- function(file, xx){    aa <- readLines(file)    bb <- grep(xx, aa, value = TRUE)    cc <- readr::parse_number(bb)    return(cc)  }  overall <- lapply(filepath, try, "being a carrier is")  Brca1 <- lapply(filepath, try, "an BRCA1 carrier")  Brca2 <- lapply(filepath, try, "an BRCA2 carrier")  

the code:

result <- lapply(filepath, try, "The probability of being a carrier is")  

works fine, I can get the number from 1st row. But I also want to get the number from 2nd and 3rd rows. So I submit

result <- lapply(filepath, try, "an BRCA1 carrier")

result <- lapply(filepath, try, "an BRCA2 carrier")

But it return 1 and 2. I guess the code return the 1 or 2 from string BRCA1 or BRCA2. Actually, I want to get the number after entire string of "an BRCA1 carrier" or "an BRCA2 carrier". How to modify the function for this? Additionally, some of text file may has NA values after those string. such as:

The probability of being a carrier is NA  an BRCA1 carrier NA  an BRCA2 carrier 0.04273394  

`

I also need the function can handle those missing values, thank you. TGG

https://stackoverflow.com/questions/66094743/r-function-update-to-fit-more-situation February 08, 2021 at 09:07AM

没有评论:

发表评论