Support Questions

Find answers, ask questions, and share your expertise

How to save dataframe as text file

avatar

How to save the data inside a dataframe to text file in csv format in HDFS?

Tried the following but csv doesn't see to be a supported format

df.write.format("csv").save("/filepath")
1 ACCEPTED SOLUTION

avatar
Master Collaborator

The best way to save dataframe to csv file is to use the library provide by Databrick Spark-csv

It provides support for almost all features you encounter using csv file.

spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

then use the library API to save to csv files

df.write.format("com.databricks.spark.csv").option("header", "true").save("file.csv")

It also support reading from csv file with similar API

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file.csv")

You could also write some custom code to create the output string using mkString, but it won't be safe if you encounter special characters and won't be able to handle quote, etc..

df.map(x => x.mkString("|")).saveAsTextFile("file.csv") 

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

The best way to save dataframe to csv file is to use the library provide by Databrick Spark-csv

It provides support for almost all features you encounter using csv file.

spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

then use the library API to save to csv files

df.write.format("com.databricks.spark.csv").option("header", "true").save("file.csv")

It also support reading from csv file with similar API

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file.csv")

You could also write some custom code to create the output string using mkString, but it won't be safe if you encounter special characters and won't be able to handle quote, etc..

df.map(x => x.mkString("|")).saveAsTextFile("file.csv") 

avatar
Contributor

@Qi Wang I think we do not have the Databrick CSV library available in the exam.

Your approach with mkString() works well if there is no header required in the output csv file. Can I assume that in the exam tasks?