Support Questions

Find answers, ask questions, and share your expertise

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Expert Contributor

thats ok 😉

avatar
Expert Contributor

thats strange... it works for me.

grunt> sensitiveSet = load '/t-spool-dir/Test-20160129-1401822-ttp.avro' USING AvroStorage();

grunt> nonSensSet = load '/d-spool-dir/Test-20160129-1401822-lake.avro' USING AvroStorage();

grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;grunt> outputSet = distinct outputSet;

grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,sensitiveSet::VIN,sensitiveSet::Birthdate,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;grunt> 

dump outputSet;

("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)

("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)

("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)

("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

.....

avatar
Master Mentor
@John Smith

your issue is with some reserved word in avro schema. Here's what I'm getting

grunt> nonSensSet = load '/user/root/Test-20160129-1401822-lake.avro' USING AvroStorage();
grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage();
grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;
grunt> outputSet = distinct outputSet;
grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption,sensitiveSet::VIN,sensitiveSet::Birthdate;
grunt> store outputSet into 'avrostorage' using AvroStorage();
2016-01-29 15:27:00,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116:
<line 6, column 0> Output Location Validation Failed for: 'hdfs://sandbox.hortonworks.com:8020/user/root/avrostorage More info to follow:
Pig Schema contains a name that is not allowed in Avro
Details at logfile: /root/pig-upload/pig_1454081182813.log

I saved the outputSet successfully as PigStorage(','); so I can't comment what the issue is. Something intricate about Avro.

avatar
Master Mentor

@John Smith I just read the AvroStorage wiki, they do say they have limited support for union schemas and record types, I guess the only thing I can comment on is that AvroStorage is limited in its functionality. Perhaps you'd want to look at other Storage Formats.

avatar
Expert Contributor

avatar
Master Mentor

@John Smith like I said, I tried with PigStorage and it worked fine, take a look at OrcStorage, which is pretty good columnar format for Pig, Hive and Spark (meaning you can query the same table from either tool natively), there are many formats, I can't recommend anything unless we know your use case. I do like Avro but sometimes it's driving me insane :). Try looking at the schemas, you can probably still get it working, I just don't have time to look at it. If you do find a solution, post here so we could all learn!

avatar
Expert Contributor

is there anything important in

Details at logfile: /root/pig-upload/pig_1454081182813.log

avatar
Master Mentor

same error as I pasted. @John Smith

avatar
Expert Contributor
/**  * Translates a name in a pig schema to an acceptable Avro name, or  * throws an error if the name can't be translated.  * @param name The variable name to translate.  * @param doubleColonsToDoubleUnderscores Indicates whether to translate  * double colons to underscores or throw an error if they are encountered.  * @return A name usable by Avro.  * @throws IOException If the name is not compatible with Avro.  */  private static String toAvroName(String name,  final Boolean doubleColonsToDoubleUnderscores) throws IOException {  if (name == null) {  return null;  }  if (doubleColonsToDoubleUnderscores) {  name = name.replace("::", "__");  }  if (name.matches("[A-Za-z_][A-Za-z0-9_]*")) {  return name;  } else {  throw new IOException(  "Pig Schema contains a name that is not allowed in Avro");  }  }

This is the check, and i dont have any characters <>

A-Za-z_][A-Za-z0-9_

defined as part of the schema in pig.

Btw i dont know why but everything i paste here some CODE/ and click to formate it into code its completely messed up, all newlines are removed... .

avatar
Master Mentor

@John Smith excellent, you went to the source code. It's actually [A-Za-z_][A-Za-z0-9_]* so plus asterisc. So if you did check and got no results of it, perhaps you discovered a bug? Once you're 100% sure, I suggest you file a Jira with Pig project.