Archives of Support Questions (Read Only)

orenault · ‎09-30-2015

The release notes for HDP 2.3 are suggesting that Spark support for ORC is not available. I was under the impression that it was supported as part of HDP 2.1 onwards.

Is it supported to read it but not write it or is it not at all supported ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch01s02s01.html

ssen · ‎09-30-2015

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

View solution in original post

awatson · ‎09-30-2015

ORC is a supported file format in the HDP platform. That being said individual projects within the platform may not be able to leverage that file format properly.

As of HDP 2.3.0, which includes Spark 1.3.1, Spark support for ORC is currently a Tech Preview. However, Spark 1.4 provides native support for reading/writing an ORC file to/from an RDD. So you should expect to see Sparks support for ORC to be GA when Spark 1.4.1 is GA in HDP (which as of Sept-2015, it's tentative for the next HDP 2.3 maintenance release)

ssen · ‎09-30-2015

Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/

vshukla · ‎10-06-2015

ORC support is GA with Spark 1.4.1 on HDP 2.3.x.

Note predicate pushdown is not enabled by default with ORC in Spark and you probably want to enable it.

Cloudera Community

Archives of Support Questions (Read Only)

ORC support for Spark on HDP 2.3