Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CSV file exported from Pyspark dataframe and downloaded from Hue UI shows text garbling (Japanese)

Highlighted

CSV file exported from Pyspark dataframe and downloaded from Hue UI shows text garbling (Japanese)

New Contributor

I found text garbling of Japanese characters in the csv file downloaded from Hue, which is encoded and exported from Pyspark using write.save method, though there are no anomalies when I opened it through Notepad of windows. 


The code for exporting CSV file is below (this code yields no errors): 

######## save as csv from Pyspark dataframe directly 
encd = 'cp932' 
df.repartition(1)\
  .write\ 
  .save(path='data.csv', 
        format='csv', 
        mode='overwrite', 
        header='true', 
        encoding=encd)

 

I tried .toPandas() method and found no such garbling in the csv exported from pandas dataframe.

dfp = df.limit(10)
pdf = dfp.toPandas()

pdf # displays no garbling

pdf.to_csv('data.csv', index=False, encoding='cp932')

 

How can I avoid this when I want to export a csv file from Pyspark dataframe directly?