Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ORC support for Spark on HDP 2.3

avatar

The release notes for HDP 2.3 are suggesting that Spark support for ORC is not available. I was under the impression that it was supported as part of HDP 2.1 onwards.

Is it supported to read it but not write it or is it not at all supported ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s02s01.html

1 ACCEPTED SOLUTION

avatar
Rising Star

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

View solution in original post

3 REPLIES 3

avatar

ORC is a supported file format in the HDP platform. That being said individual projects within the platform may not be able to leverage that file format properly.

As of HDP 2.3.0, which includes Spark 1.3.1, Spark support for ORC is currently a Tech Preview. However, Spark 1.4 provides native support for reading/writing an ORC file to/from an RDD. So you should expect to see Sparks support for ORC to be GA when Spark 1.4.1 is GA in HDP (which as of Sept-2015, it's tentative for the next HDP 2.3 maintenance release)

avatar
Rising Star

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

avatar

ORC support is GA with Spark 1.4.1 on HDP 2.3.x.

Note predicate pushdown is not enabled by default with ORC in Spark and you probably want to enable it.