How to save the data inside a dataframe to text file in csv format in HDFS?
Tried the following but csv doesn't see to be a supported format
The best way to save dataframe to csv file is to use the library provide by Databrick Spark-csv
It provides support for almost all features you encounter using csv file.
spark-shell --packages com.databricks:spark-csv_2.10:1.4.0
then use the library API to save to csv files
It also support reading from csv file with similar API
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file.csv")
You could also write some custom code to create the output string using mkString, but it won't be safe if you encounter special characters and won't be able to handle quote, etc..
df.map(x => x.mkString("|")).saveAsTextFile("file.csv")
View solution in original post
@Qi Wang I think we do not have the Databrick CSV library available in the exam.
Your approach with mkString() works well if there is no header required in the output csv file. Can I assume that in the exam tasks?