2021年2月5日星期五

Splitting datasets into train and test in julia

I am trying to split the dataset into train and test subsets in Julia. So far, I have tried using MLDataUtils.jl package for this operation, however, the results are not up to the expectations. Below are my findings and issues:

Code

# the inputs are    a = DataFrame(A = [1, 2, 3, 4,5, 6, 7, 8, 9, 10],                B = [1, 2, 3, 4,5, 6, 7, 8, 9, 10],                C = [1, 2, 3, 4,5, 6, 7, 8, 9, 10]               )  b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]    using MLDataUtils  (x1, y1), (x2, y2) = stratifiedobs((a,b), p=0.7)    #Output of this operation is: (which is not the expectation)  println("x1 is: $x1")  x1 is:  10×3 DataFrame  │ Row │ A     │ B     │ C     │  │     │ Int64 │ Int64 │ Int64 │  ├─────┼───────┼───────┼───────┤  │ 1   │ 1     │ 1     │ 1     │  │ 2   │ 2     │ 2     │ 2     │  │ 3   │ 3     │ 3     │ 3     │  │ 4   │ 4     │ 4     │ 4     │  │ 5   │ 5     │ 5     │ 5     │  │ 6   │ 6     │ 6     │ 6     │  │ 7   │ 7     │ 7     │ 7     │  │ 8   │ 8     │ 8     │ 8     │  │ 9   │ 9     │ 9     │ 9     │  │ 10  │ 10    │ 10    │ 10    │    println("y1 is: $y1")  y1 is:  10-element Array{Int64,1}:    1    2    3    4    5    6    7    8    9   10    # but x2 is printed as   (0×3 SubDataFrame, Float64[])     # while y2 as   0-element view(::Array{Float64,1}, Int64[]) with eltype Float64)  

However, I would like this dataset to be split in 2 parts with 70% data in train and 30% in test. Please suggest a better approach to perform this operation in julia. Thanks in advance.

https://stackoverflow.com/questions/66059128/splitting-datasets-into-train-and-test-in-julia February 05, 2021 at 03:18PM

没有评论:

发表评论