Support Questions

bkabhijit · ‎02-02-2017

As we know that, due to restricted environment for HDPCD-Spark exam, can't download any 3rd party jars.

And as we also know that, we can save/load DF to/from JSON/ORC/PARQUET file formats.

However, there is an issue with CSV files.

Hence, my question is that:

How to save the DataFrame to a CSV file using pure Spark Core or Spark SQL APIs? & vice-a-versa.

Thanks.

wgonzalez · ‎02-03-2017

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")

View solution in original post

wgonzalez · ‎02-03-2017

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")

Cloudera Community

Support Questions

HDPCD-Spark: How to convert DF to CSV file?