So I'm generating a json string and storing it to a mutable.ListBuffer[String]. Sample code below:
def generateEntry() = { s""" |{ | "memberId": ${java.util.UUID.randomUUID.toString}, | "first_name": ${nameRandomizer}, | "last_name": ${nameRandomizer |}""".stripMargin } // Generate 3 rows of Json String with fields: memberId, first_name, last_name val entryList = mutable.ListBuffer[String]() for (_ <- 1 to 3) { entryList += generateEntry() } val inputRDD: RDD[String] = sc.parallelize(entryList.result()) I actually added a session.read.json statement but it still reads the row as 1, not as 3 separate fields.
implicit val stringEncoder: Encoder[String] = Encoders.STRING val dataSet = session.createDataset(inputRdd) val dsToRDD = session.read.option("multiLine", value = true).json(dataSet) dsToRDD.foreach(row => logger.info(s"row.length: ${row.length} row: ${row.mkString}\t")) Actual Result:
row.length: 1 row: { "memberId":"42b41102-de0f-4157-a4c3-dbe28ec073b3", "first_name": "Eunwoo", "last_name": "Cha" } Expected Result:
row.length: 3 row: 42b41102-de0f-4157-a4c3-dbe28ec073b3EunwooCha https://stackoverflow.com/questions/66593615/how-to-convert-json-string-to-rddrow-and-each-json-field-should-be-1-row March 12, 2021 at 11:07AM
没有评论:
发表评论