Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ORC support for Spark on HDP 2.3

avatar

The release notes for HDP 2.3 are suggesting that Spark support for ORC is not available. I was under the impression that it was supported as part of HDP 2.1 onwards.

Is it supported to read it but not write it or is it not at all supported ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s02s01.html

1 ACCEPTED SOLUTION

avatar
Rising Star

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

View solution in original post

3 REPLIES 3

avatar

ORC is a supported file format in the HDP platform. That being said individual projects within the platform may not be able to leverage that file format properly.

As of HDP 2.3.0, which includes Spark 1.3.1, Spark support for ORC is currently a Tech Preview. However, Spark 1.4 provides native support for reading/writing an ORC file to/from an RDD. So you should expect to see Sparks support for ORC to be GA when Spark 1.4.1 is GA in HDP (which as of Sept-2015, it's tentative for the next HDP 2.3 maintenance release)

avatar
Rising Star

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

avatar

ORC support is GA with Spark 1.4.1 on HDP 2.3.x.

Note predicate pushdown is not enabled by default with ORC in Spark and you probably want to enable it.