Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Master Mentor

@John Smith yes I followed your efforts, can you live with the work around? I honestly don't have cycles to investigate further for you. I think I learned something myself thanks to you :). I'd say open an article on HCC with the proposed workaround and your desired goal, maybe someone else can weigh in. Great job John!

avatar
Expert Contributor

well i cant live with that workaround, thats the problem. what i HCC?

avatar
Master Mentor

this website is called Hortonworks Community Connection, HCC for short. Again, post this as a separate issue with tags for Avro and Pig. @John Smith

avatar
Expert Contributor

is there any update on this?

avatar
Expert Contributor

one more important observation, when i dump data into avro using

store outputSet into 'avrostorage' using AvroStorage();

the schema inside avro file looks like:

{"type":"record","name":"pig_output","fields":[{"name":"name","type":["null","string"]},{"name":"customerId","type":["null","string"]},{"name":"VIN","type":["null","string"]},{"name":"Birthdate","type":["null","string"]},{"name":"Mileage","type":["null","string"]},{"name":"Fuel_Consumption","type":["null","string"]}]}

Why each field contains null?

avatar
Master Mentor

@John Smith it means the field can be null if missing, an optional field that is. That way if you don't pass a field it won't complain.

avatar
Expert Contributor

sure but input data contains all the field, so my question is why it generates [null] as part of the datatype.

Also still no luck with

https://issues.apache.org/jira/browse/PIG-4793

avatar
Master Mentor

@John Smith read avro docs for explanation of optionalvs default fields.

avatar
Expert Contributor

ah i already did ... my question was why its there ... when i use local mode its not there .. anyway there is no reply from anyone behind avrostorage... thats pretty odd.

avatar
Master Mentor

@John Smith its a better practice so that if you do happen to get a null at least it won't bomb. As far as jira, that's open source, individual contributors also need earn a living and if there's higher responsibilities then they'll get to it when queue is clear. I wouldn't get your hopes up and identify alternative ways. Shoot an email to the avro mailing list. They may help faster.