2021年3月11日星期四

preparing multi sample time series data for RNN/LSTM classification problem

I'm trying to use the time component of my data in order to benefit my classification problem. The data I'm working with is labeled network data captures or scenarios. The scenarios are labeled as either 0 or 1 "good", "bad" flows. I've read in the literature that using LSTM Neural Networks should work quite well. The only problem is there is not a lot of detail as to how data is fed/organized into an LSTM NN. In order to describe the problem, I drew a quick sketch. As can be seen, the number of observations is not uniform for each scenario. The times between scenarios are not continuous.

enter image description here

Things I've thought about doing are taking x number of ordered observations from each scenario then place it in either train or test.

The other question I have is how to handle the validation set in this case?

In order to help the conversation here is some dummy data that I created R. I like R but have been doing most of NN modeling in python.

    data_gen <- function(n,start_date){  set.seed(123)  u <- runif(n, 1, 10*n)   time <- as.POSIXlt(u, origin = start_date)  time <- sort(time, decreasing = T)  col_1 <-u  col_2 <-abs(rnorm(n, 4,3))  col_3 <-abs(rweibull(n, .5,.1))  label <-sample(x=c(0,1), size=n, replace=TRUE, prob=c(.20,.70))  return(data.frame(time, col_1,col_2,col_3,label))  }      senerio_1 <- data_gen(1043,"2017-02-03 08:00:01")  senerio_2 <- data_gen(2421,"2018-04-03 10:30:30")  senerio_3 <- data_gen(834, "2017-04-21 11:00:03")  senerio_4 <- data_gen(8834,"2017-04-21 16:02:02")  

So how do I prepare my data for RNN/LSTM?

https://stackoverflow.com/questions/66592768/preparing-multi-sample-time-series-data-for-rnn-lstm-classification-problem March 12, 2021 at 09:06AM

没有评论:

发表评论