I have an external table in Hive(2.1.1) which has the data stored as ORC file. Now, I want to read this ORC file from Mapper class using ORCInputFormat class. I have added these dependencies in maven for ORC along with other required jars(hadoop and hive) for running the MapReduce application. The hadoop version is 2.7.3
<dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-mapreduce</artifactId> <version>1.2.3</version> </dependency>
<dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-core</artifactId> <version>1.2.3</version> </dependency>
While running the job, I am getting this error:
FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/io/DiskRange at org.apache.orc.OrcFile.createReader(OrcFile.java:227)
I searched through Hive javadocs and found out that this class has to be in hive-common-2.1.1.jar. On extracting, I found out that this is not present over there, although API docs shows that it is a concrete class. Please help guys. Thanks a lot
It is done. Was referring wrong jar. The correct one is hive-exec-2.1.1.jar. The data coming out from MapReduce is correct.