I am trying to split the dataset into train and test subsets in Julia. So far, I have tried using MLDataUtils.jl package for this operation, however, the results are not up to the expectations. Below are my findings and issues:
Code
# the inputs are a = DataFrame(A = [1, 2, 3, 4,5, 6, 7, 8, 9, 10], B = [1, 2, 3, 4,5, 6, 7, 8, 9, 10], C = [1, 2, 3, 4,5, 6, 7, 8, 9, 10] ) b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] using MLDataUtils (x1, y1), (x2, y2) = stratifiedobs((a,b), p=0.7) #Output of this operation is: (which is not the expectation) println("x1 is: $x1") x1 is: 10×3 DataFrame │ Row │ A │ B │ C │ │ │ Int64 │ Int64 │ Int64 │ ├─────┼───────┼───────┼───────┤ │ 1 │ 1 │ 1 │ 1 │ │ 2 │ 2 │ 2 │ 2 │ │ 3 │ 3 │ 3 │ 3 │ │ 4 │ 4 │ 4 │ 4 │ │ 5 │ 5 │ 5 │ 5 │ │ 6 │ 6 │ 6 │ 6 │ │ 7 │ 7 │ 7 │ 7 │ │ 8 │ 8 │ 8 │ 8 │ │ 9 │ 9 │ 9 │ 9 │ │ 10 │ 10 │ 10 │ 10 │ println("y1 is: $y1") y1 is: 10-element Array{Int64,1}: 1 2 3 4 5 6 7 8 9 10 # but x2 is printed as (0×3 SubDataFrame, Float64[]) # while y2 as 0-element view(::Array{Float64,1}, Int64[]) with eltype Float64)
However, I would like this dataset to be split in 2 parts with 70% data in train and 30% in test. Please suggest a better approach to perform this operation in julia. Thanks in advance.
https://stackoverflow.com/questions/66059128/splitting-datasets-into-train-and-test-in-julia February 05, 2021 at 03:18PM
没有评论:
发表评论