2021年3月11日星期四

How to convert Json String to RDD[Row] and each json field should be 1 row?

So I'm generating a json string and storing it to a mutable.ListBuffer[String]. Sample code below:

def generateEntry() = {   s"""      |{      | "memberId": ${java.util.UUID.randomUUID.toString},      | "first_name": ${nameRandomizer},      | "last_name": ${nameRandomizer      |}""".stripMargin  }    // Generate 3 rows of Json String with fields: memberId, first_name, last_name  val entryList = mutable.ListBuffer[String]()  for (_ <- 1 to 3) {   entryList += generateEntry()  }    val inputRDD: RDD[String] = sc.parallelize(entryList.result())  

I actually added a session.read.json statement but it still reads the row as 1, not as 3 separate fields.

    implicit val stringEncoder: Encoder[String] = Encoders.STRING      val dataSet = session.createDataset(inputRdd)      val dsToRDD = session.read.option("multiLine", value = true).json(dataSet)      dsToRDD.foreach(row =>        logger.info(s"row.length: ${row.length}  row: ${row.mkString}\t"))  

Actual Result:

row.length: 1  row:   {   "memberId":"42b41102-de0f-4157-a4c3-dbe28ec073b3",   "first_name": "Eunwoo",   "last_name": "Cha"  }  

Expected Result:

row.length: 3  row: 42b41102-de0f-4157-a4c3-dbe28ec073b3EunwooCha  
https://stackoverflow.com/questions/66593615/how-to-convert-json-string-to-rddrow-and-each-json-field-should-be-1-row March 12, 2021 at 11:07AM

没有评论:

发表评论