08-13-2018 09:52 PM - last edited on 08-14-2018 05:40 AM by cjervis
I found text garbling of Japanese characters in the csv file downloaded from Hue, which is encoded and exported from Pyspark using write.save method, though there are no anomalies when I opened it through Notepad of windows.
The code for exporting CSV file is below (this code yields no errors):
######## save as csv from Pyspark dataframe directly encd = 'cp932' df.repartition(1)\ .write\ .save(path='data.csv', format='csv', mode='overwrite', header='true', encoding=encd)
I tried .toPandas() method and found no such garbling in the csv exported from pandas dataframe.
dfp = df.limit(10) pdf = dfp.toPandas() pdf # displays no garbling pdf.to_csv('data.csv', index=False, encoding='cp932')
How can I avoid this when I want to export a csv file from Pyspark dataframe directly?