Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem of compatibility of an external orc and Claudera’s hive

Solved Go to solution
Highlighted

Problem of compatibility of an external orc and Claudera’s hive

New Contributor

I can not solve the problem of compatibility of an external orc and Claudera’s hive.

I have cloudera express version 6.3.2 with hive version 2.1.1
In general, it’s strange, I downloaded the latest version of claudera, and there is old hive 2.1.1 there

Case:
- Externally I create some orc (I tried to create it in the local spark and in the same cloudera through map reducer job - the same result)
- I'm trying to read this orc in my claudera even through orcfiledump
- I get
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
at org.apache.orc.OrcFile $ WriterVersion.from (OrcFile.java:145)
- I downloaded the orc-tools-1.5.5-uber.jar utility locally to my computer
- Also downloaded there the problematic orc
- Performed by java -jar orc-tools-1.5.5-uber.jar meta msout2o12.orc
- Uber jar with its own hadoop inside have read this orc ok
Structure for msout2o12.orc
File Version: 0.12 with ORC_135
Rows: 242
Compression: ZLIB
Compression size: 262144

Without any creation of tables, just a hive in the cloudera can stupidly not be able to read the orc using its own utility.
The problem begun from the fact that I created an external table and hiveql on the orc generated such error.
But here it just stupidly reduced the problem to a minimum, just hive --orafiledump can not read the orc.
How to make cloudera read normally orcs? ..
What to tighten up in my cloudera?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Problem of compatibility of an external orc and Claudera’s hive

Cloudera Employee
2 REPLIES 2
Highlighted

Re: Problem of compatibility of an external orc and Claudera’s hive

Cloudera Employee

Re: Problem of compatibility of an external orc and Claudera’s hive

New Contributor

Thank you for reply!

I have read in official hive doc that orc format is native for hive, so I prefered use orc and rebuild my ETL from parquet to orc.

But you showed me that cloudera's hive is something other then hive in general and I am very suprised by that )

Ok, I will switch to parquet again.

By the way, if I create external table with stored as orc and run some insert from hive then all is ok and cloudera's hive created 000000_0.orc files and works with it very well.

But from external world (streamsets, spark), yes, hive does not accept orc.

I have some problems with hive+parquet processing too (this is a reason I switched to orc format), but it is another question and another story

Thank you again! I spent a lot of time trying to understand what is wrong with hive and orc.

So classic is classic, I will use parquet at this time.

Don't have an account?
Coming from Hortonworks? Activate your account here