Created 08-11-2017 11:32 AM
Hi All, How can we add a header to Spark SQL Query results before saving the results in a textfile? Spark version is 1.6
val topPriceResultsDF = sqlContext.sql("SELECT * FROM retail_db.yahoo_stock_orc WHERE open_price > 40 AND high_price > 40 ORDER BY date ASC") topPriceResultsDF.map(x => x.mkString(",")).saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40_results(comma).csv")
It saves only data but I need to add header like (date,open_price,high_price,low_price,close_price,volume,adj_price) as well . Please help if anyone has idea !! I cannot use databricks library.
O/P should be like date,open_price,high_price,low_price,close_price,volume,adj_price 1997-07-09,40.75008,45.12504,40.75008,43.99992,37545600,1.83333
Thanks !!
Created 08-13-2017 02:39 PM
Use mapPartitions if we want to add header in all files or if there is single partition.
topPriceResultsDF .map(x => x.mkString(",")) .mapPartitions(iter => Iterator(header) ++ iter) .saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv")
Use mapPartitionsWithIndex if we want to add header in only first file
topPriceResultsDF.map(x => x.mkString(",")) .repartition(2) .mapPartitionsWithIndex ({ case (0, iter) => Iterator(header) ++ iter case (_, iter) => iter }) .saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv")