Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Expert Contributor

Ok, waiting on your results! Thank you

avatar
Expert Contributor

@Artem Ervits here you can find one more output - > sources were read successfully but output failed,

http://paste.debian.net/377433/

avatar
Expert Contributor

it is failing to write output avro file but log says:

web-log says

Application Overview

User: hdfs

Name: PigLatin:pigMerger.pig

Application Type: MAPREDUCE

Application Tags:

YarnApplicationState: FINISHED

Queue: default

FinalStatus Reported by AM: SUCCEEDED

Started: Fri Jan 29 12:59:25 +0000 2016

Elapsed: 4mins, 29sec

Tracking URL: History

Log Aggregation Status SUCCEEDED

Diagnostics:

avatar
Master Mentor

can you paste sample dataset and pig script, I'll try to reproduce sometime today on my machine. It's hard to see the issue from the logs. @John Smith

avatar
Expert Contributor

@Artem Ervits source files and pig script are included Data , thanks

avatar
Expert Contributor

one more log:

log

avatar
Master Mentor

@John Smith I highly recommend you develop your scripts in Pig Grunt shell. This is what happened with your script as I was trying to execute it one by one.

grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage();
grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;
grunt> outputSet = distinct outputSet;
grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,sensitiveSet::VIN,sensitiveSet::Birthdate,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;
2016-01-29 14:41:59,228 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 5, column 79> Invalid field projection. Projected field [sensitiveSet::VIN] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray.
Details at logfile: /root/pig-upload/pig_1454078371113.log


grunt> describe sensitiveSet;
sensitiveSet: {Row_ID: long,name: chararray,customerId: chararray,Mileage: chararray,Fuel_Consumption: chararray}

avatar
Expert Contributor

i do develop everything in grunt,... you are missing one line in that script

nonSensSet = load '/d-spool-dir/Test-20160129-1401822-lake.avro' USING AvroStorage();

avatar
Master Mentor

@John Smith birthdate also doesn't exist

<line 5, column 79> Invalid field projection. Projected field [sensitiveSet::Birthdate] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray.
Details at logfile: /root/pig-upload/pig_1454078371113.log

-- this works

grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;

avatar
Master Mentor
@John Smith

ok it was a problem with me, I copied the same file twice. nevermind that issue, I'm still looking.