origin pandas dataframe is below:
id song_name 001MpsbI1FoQgs02 只想好好爱一回 000qq4Kk2WMPgU02 大森林的早晨
and I try to convert pandas dataframe into pyspark dataframe.
code:
all_song_py=spark.createDataFrame(all_song[[u'id',u'song_name']],mySchema)
the result is:
| id | song_name |001MpsbI1FoQgs02|åªæƒ³å¥½å¥½çˆ±ä¸€å›ž |000qq4Kk2WMPgU02|大森林的早晨
I tried to decode the garbled character using below code,but it doesn't work.
decode_udf= udf(lambda val: urllib.unquote(val.encode('utf-8')).decode('gb18030'), StringType())
the result is :
氓聫陋忙聝鲁氓楼陆氓楼陆莽聢卤盲赂聙氓聸聻
any solution?
https://stackoverflow.com/questions/65785788/get-garbled-character-when-i-turn-pandas-into-pyspark-dataframe-in-python2-7 January 19, 2021 at 01:13PM
没有评论:
发表评论