Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HCatLoader error on loading Hive external table serialized as Avro in Pig script

Highlighted

HCatLoader error on loading Hive external table serialized as Avro in Pig script

Explorer

Error:

 

pig script failed to validate: java.lang.RuntimeException: couild not instantiate org.apache.hive.hcatalog.pig.HCatLoader with arguements 'null'

 

caused by ...

 

Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException

at org.apache.hive.hcatalog.pig.HCatLoader.<init>(HCatLoader.java:70)

 

which is this line in hive-hcatalog-pig-adapter-0.13.1-cdh5.3.0.jar

 

private final PigHCatUtil phutil = new PigHCatUtil();

 

Environment

 

Cdh 5.3

External Hive table in Avro format

Pig script skeleton:

 

REGISTER /usr/lib/avro/avro.jar

REGISTER /usr/lib/pig/piggybank.jar

REGISTER /usr/lib/hive-catalog/share/hcatalog/hive-hcatalog-core.jar

REGISTER /usr/lib/hive-catalog/share/hcatalog/hive-hcatalog-pig-adapter.jar

 

raw = LOAD 'external.data1' USING org.apache.hive.hcatalog.pig.HCatLoader();

 

conditioned = foreach raw generate

 

< comdition stuff>;

 

STORE conditioned INTO 'conditioned.data1' USING org.apache.hive.hcatalog.pig.HCatStorer();

 

1 REPLY 1

Re: HCatLoader error on loading Hive external table serialized as Avro in Pig script

Explorer

A more succinct description:

 

I have tried this with both org.apache.hcatalog.pig.HCatStorer and org.apache.hive.hcatalog.pig.HCatStorer; both fail in differerent ways. I am able to do this in Hive and using AvroStorage (no HCat). I am also able to dump the results to console. I tried parquet by changing the create table statement to end with STORED AS PARQUET. The tables appear to be created successfully with both Avro and Parquet.

YarnChild:Exception running child: java.lang.NullPointerException
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:99)
at org.apache.hcatalog.mapreduce.InternalUtil.initializeOutputSerDe(InternalUtil.java:148)
at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.<init>(FileRecordWriterContainer.java:93)
at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:101)
at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:245)
...




I am running the pig script via CLI with the -useHCatalog option

Source: external Hive table in Avro format
Target: defined in hive as Avro:

create table conditioned.data1 (
field1 STRING COMMENT 'cmt1',
field2 Double COMMENT 'cmt2'
)
STORED AS AVRO;

raw = LOAD 'external.data1' USING org.apache.hcatalog.pig.HCatLoader();

conditioned = foreach raw generate
sourcefield1 as field1;
(double)sourcefield2 as field2;
STORE conditioned INTO 'conditioned.data1' USING org.apache.hcatalog.pig.HCatStorer();