Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎08-13-2018

CSV file exported from Pyspark dataframe and downloaded from Hue UI shows text garbling (Japanese)

[ Edited ]

I found text garbling of Japanese characters in the csv file downloaded from Hue, which is encoded and exported from Pyspark using write.save method, though there are no anomalies when I opened it through Notepad of windows. 


The code for exporting CSV file is below (this code yields no errors): 

######## save as csv from Pyspark dataframe directly 
encd = 'cp932' 
df.repartition(1)\
  .write\ 
  .save(path='data.csv', 
        format='csv', 
        mode='overwrite', 
        header='true', 
        encoding=encd)

 

I tried .toPandas() method and found no such garbling in the csv exported from pandas dataframe.

dfp = df.limit(10)
pdf = dfp.toPandas()

pdf # displays no garbling

pdf.to_csv('data.csv', index=False, encoding='cp932')

 

How can I avoid this when I want to export a csv file from Pyspark dataframe directly?

 

 

Announcements