Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDPCD-Spark: How to convert DF to CSV file?

Solved Go to solution

HDPCD-Spark: How to convert DF to CSV file?

Explorer

As we know that, due to restricted environment for HDPCD-Spark exam, can't download any 3rd party jars.

And as we also know that, we can save/load DF to/from JSON/ORC/PARQUET file formats.

However, there is an issue with CSV files.

Hence, my question is that:

How to save the DataFrame to a CSV file using pure Spark Core or Spark SQL APIs? & vice-a-versa.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HDPCD-Spark: How to convert DF to CSV file?

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")

View solution in original post

1 REPLY 1
Highlighted

Re: HDPCD-Spark: How to convert DF to CSV file?

You have to convert the DF to a table, then save the table like:

myDF.registerTempTable("myTempTable")
val myDFTable = sqlContext.sql("SELECT col1, col2, col3 FROM myTempTable WHERE col2 > 1000")
myDFTable.map(x => x(0) + "," + x(1) + "," + x(2)).saveAsTextFile("output.csv")

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here