2021年2月7日星期日

Spark: Transpose Rows to Columns with Multiple Fields

May I ask what is the best way to transpose the rows into columns with multiple fields?

I have a dataframe as below.

val inputDF = Seq(                    ("100","A", 10, 200),                    ("100","B", 20, 300),                    ("101","A", 30, 100)                ).toDF("ID", "Type", "Value", "Allocation")  

I want to generate a dataframe as below.

val outputDF = Seq(                    ("100", 10, 200, 20, 300),                    ("101", 30, 100, NULL, NULL)                ).toDF("ID", "Type_A", "Value_A", "Allocation_A", "Type_B", "Value_B", "Allocation_B")  

I tried to use pivot as below.

val outputDF = inputDF.groupBy("ID", "Type").pivot("Type).agg(first("Value"), first("Allocation"))  

It generated something as below, which is not what I wanted.

+---+----+--------------+-------------------+--------------+-------------------+  | ID|Type|A_first(Value)|A_first(Allocation)|B_first(Value)|B_first(Allocation)|  +---+----+--------------+-------------------+--------------+-------------------+  |100|   B|          null|               null|            20|                300|  |100|   A|            10|                200|          null|               null|  |101|   A|            30|                100|          null|               null|  +---+----+--------------+-------------------+--------------+-------------------+  

Thank you very much!

https://stackoverflow.com/questions/66095087/spark-transpose-rows-to-columns-with-multiple-fields February 08, 2021 at 10:06AM

没有评论:

发表评论