Created 09-30-2015 12:51 PM
The release notes for HDP 2.3 are suggesting that Spark support for ORC is not available. I was under the impression that it was supported as part of HDP 2.1 onwards.
Is it supported to read it but not write it or is it not at all supported ?
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s02s01.html
Created 09-30-2015 04:59 PM
Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/
Created 09-30-2015 01:05 PM
ORC is a supported file format in the HDP platform. That being said individual projects within the platform may not be able to leverage that file format properly.
As of HDP 2.3.0, which includes Spark 1.3.1, Spark support for ORC is currently a Tech Preview. However, Spark 1.4 provides native support for reading/writing an ORC file to/from an RDD. So you should expect to see Sparks support for ORC to be GA when Spark 1.4.1 is GA in HDP (which as of Sept-2015, it's tentative for the next HDP 2.3 maintenance release)
Created 09-30-2015 04:59 PM
Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/
Created 10-06-2015 04:12 AM
ORC support is GA with Spark 1.4.1 on HDP 2.3.x.
Note predicate pushdown is not enabled by default with ORC in Spark and you probably want to enable it.