Suppose I have a vector ab containing A's and B's. I want to identify sequences and create a vector v with length(ab) that indicates the sequence length at the beginning and end of a given sequence and NA otherwise.
I have however the restriction that another vector x with 0/1 will indicate that a sequence ends.
So for example:
rep("A", 6) "A" "A" "A" "A" "A" "A" x <- c(0,0,1,0,0,0) 0 0 1 0 0 0 should give
v <- c(3 NA 3 3 NA 3) An example could be the following:
ab <- c(rep("A", 5), "B", rep("A", 3)) "A" "A" "A" "A" "A" "B" "A" "A" "A" x <- c(rep(0,3),1,0,1,rep(0,3)) 0 0 0 1 0 1 0 0 0 Here the output should be:
4 NA NA 4 1 1 3 NA 3 (without the restriction it would be) 5 NA NA NA 5 1 3 NA 3 So far, my code without the restriction looks like this:
ab <- c(rep("A", 5), "B", rep("A", 3)) x <- c(rep(0,3),1,0,1,rep(0,3)) cng <- ab[-1L] != ab[-length(ab)] # is there a change in A and B w.r.t the previous value? idx <- which(cng) # where do the changes take place? idx <- c(idx,length(ab)) # include the last value seq_length <- diff(c(0, idx)) # how long are the sequences? # create v v <- rep(NA, length(ab)) v[idx] <- seq_length # sequence end v[idx-(seq_length-1)] <- seq_length # sequence start v Does anyone have an idea how I can implement the restriction? (And since my vector has 2 Millions of observations, I wonder whether there would be a more efficient way than my approach) I would appreciate any comments! Many thanks in advance!
https://stackoverflow.com/questions/66443043/r-identify-sequences-in-a-vector March 03, 2021 at 12:15AM
没有评论:
发表评论