Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Expert Contributor

Ok, waiting on your results! Thank you

avatar
Expert Contributor

@Artem Ervits here you can find one more output - > sources were read successfully but output failed,

http://paste.debian.net/377433/

avatar
Expert Contributor

it is failing to write output avro file but log says:

web-log says

Application Overview

User: hdfs

Name: PigLatin:pigMerger.pig

Application Type: MAPREDUCE

Application Tags:

YarnApplicationState: FINISHED

Queue: default

FinalStatus Reported by AM: SUCCEEDED

Started: Fri Jan 29 12:59:25 +0000 2016

Elapsed: 4mins, 29sec

Tracking URL: History

Log Aggregation Status SUCCEEDED

Diagnostics:

avatar
Master Mentor

can you paste sample dataset and pig script, I'll try to reproduce sometime today on my machine. It's hard to see the issue from the logs. @John Smith

avatar
Expert Contributor

@Artem Ervits source files and pig script are included Data , thanks

avatar
Expert Contributor

one more log:

log

avatar
Master Mentor

@John Smith I highly recommend you develop your scripts in Pig Grunt shell. This is what happened with your script as I was trying to execute it one by one.

grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage();
grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;
grunt> outputSet = distinct outputSet;
grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,sensitiveSet::VIN,sensitiveSet::Birthdate,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;
2016-01-29 14:41:59,228 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 5, column 79> Invalid field projection. Projected field [sensitiveSet::VIN] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray.
Details at logfile: /root/pig-upload/pig_1454078371113.log


grunt> describe sensitiveSet;
sensitiveSet: {Row_ID: long,name: chararray,customerId: chararray,Mileage: chararray,Fuel_Consumption: chararray}

avatar
Expert Contributor

i do develop everything in grunt,... you are missing one line in that script

nonSensSet = load '/d-spool-dir/Test-20160129-1401822-lake.avro' USING AvroStorage();

avatar
Master Mentor

@John Smith birthdate also doesn't exist

<line 5, column 79> Invalid field projection. Projected field [sensitiveSet::Birthdate] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray.
Details at logfile: /root/pig-upload/pig_1454078371113.log

-- this works

grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;

avatar
Master Mentor
@John Smith

ok it was a problem with me, I copied the same file twice. nevermind that issue, I'm still looking.