Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Master Mentor

@John Smith yes I followed your efforts, can you live with the work around? I honestly don't have cycles to investigate further for you. I think I learned something myself thanks to you :). I'd say open an article on HCC with the proposed workaround and your desired goal, maybe someone else can weigh in. Great job John!

avatar
Expert Contributor

well i cant live with that workaround, thats the problem. what i HCC?

avatar
Master Mentor

this website is called Hortonworks Community Connection, HCC for short. Again, post this as a separate issue with tags for Avro and Pig. @John Smith

avatar
Expert Contributor

is there any update on this?

avatar
Expert Contributor

one more important observation, when i dump data into avro using

store outputSet into 'avrostorage' using AvroStorage();

the schema inside avro file looks like:

{"type":"record","name":"pig_output","fields":[{"name":"name","type":["null","string"]},{"name":"customerId","type":["null","string"]},{"name":"VIN","type":["null","string"]},{"name":"Birthdate","type":["null","string"]},{"name":"Mileage","type":["null","string"]},{"name":"Fuel_Consumption","type":["null","string"]}]}

Why each field contains null?

avatar
Master Mentor

@John Smith it means the field can be null if missing, an optional field that is. That way if you don't pass a field it won't complain.

avatar
Expert Contributor

sure but input data contains all the field, so my question is why it generates [null] as part of the datatype.

Also still no luck with

https://issues.apache.org/jira/browse/PIG-4793

avatar
Master Mentor

@John Smith read avro docs for explanation of optionalvs default fields.

avatar
Expert Contributor

ah i already did ... my question was why its there ... when i use local mode its not there .. anyway there is no reply from anyone behind avrostorage... thats pretty odd.

avatar
Master Mentor

@John Smith its a better practice so that if you do happen to get a null at least it won't bomb. As far as jira, that's open source, individual contributors also need earn a living and if there's higher responsibilities then they'll get to it when queue is clear. I wouldn't get your hopes up and identify alternative ways. Shoot an email to the avro mailing list. They may help faster.