Support Questions

__anonymous__ · ‎07-22-2016

How to save the data inside a dataframe to text file in csv format in HDFS?

Tried the following but csv doesn't see to be a supported format

df.write.format("csv").save("/filepath")

qiwang · ‎07-22-2016

The best way to save dataframe to csv file is to use the library provide by Databrick Spark-csv

It provides support for almost all features you encounter using csv file.

spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

then use the library API to save to csv files

df.write.format("com.databricks.spark.csv").option("header", "true").save("file.csv")

It also support reading from csv file with similar API

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file.csv")

You could also write some custom code to create the output string using mkString, but it won't be safe if you encounter special characters and won't be able to handle quote, etc..

df.map(x => x.mkString("|")).saveAsTextFile("file.csv")

View solution in original post

qiwang · ‎07-22-2016

The best way to save dataframe to csv file is to use the library provide by Databrick Spark-csv

It provides support for almost all features you encounter using csv file.

spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

then use the library API to save to csv files

df.write.format("com.databricks.spark.csv").option("header", "true").save("file.csv")

It also support reading from csv file with similar API

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file.csv")

You could also write some custom code to create the output string using mkString, but it won't be safe if you encounter special characters and won't be able to handle quote, etc..

df.map(x => x.mkString("|")).saveAsTextFile("file.csv")

stefan_frankenh · ‎12-06-2016

@Qi Wang I think we do not have the Databrick CSV library available in the exam.

Your approach with mkString() works well if there is no header required in the output csv file. Can I assume that in the exam tasks?

Cloudera Community

Support Questions

How to save dataframe as text file