Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Orc files in hdf: NullPointerException (RunLengthIntegerReaderV2)

Highlighted

Orc files in hdf: NullPointerException (RunLengthIntegerReaderV2)

New Contributor

Hello,

I face to one error when I try to read my Orc files from Hive (external table) or Pig or with hive --orcfiledump ..

These files are generated with Flink using the Orc Java API with Vectorize column.

If I create these files locally (/tmp/...), push them to hdfs, then I can read the content of these files from Pig or with the use of External table in Hive.

If I change the path and use hdfs directly, then I face to this error :

 Failure while running task:java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
        at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$TimestampTreeReader.next(TreeReaderFactory.java:1105)
        at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2079)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1082)
        at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader.nextKeyValue(OrcNewInputFormat.java:108)

And the same if I get these files locally.

1 REPLY 1

In fact the problem is related to th ...

New Contributor

In fact the problem is related to the java Orc api when parallelism is activated (multi-thread)

I use Flink and when I set a parallelism > 1 on the Sink that generates Orc files, I face to this issue: data are unreadable.

I've seen some tickets about this issue like this one: https://jira.apache.org/jira/browse/ORC-361

At the moment I use a parallelism of 1 but I have to fix this issue in order to scale my ingest pipeline.

All help is welcome.

Thx