Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDPCD-Spark: How to convert DF to CSV file?

avatar
New Member

As we know that, due to restricted environment for HDPCD-Spark exam, can't download any 3rd party jars.

And as we also know that, we can save/load DF to/from JSON/ORC/PARQUET file formats.

However, there is an issue with CSV files.

Hence, my question is that:

How to save the DataFrame to a CSV file using pure Spark Core or Spark SQL APIs? & vice-a-versa.

Thanks.

1 ACCEPTED SOLUTION

avatar

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")

View solution in original post

1 REPLY 1

avatar

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")