Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Expert Contributor

thats ok 😉

avatar
Expert Contributor

thats strange... it works for me.

grunt> sensitiveSet = load '/t-spool-dir/Test-20160129-1401822-ttp.avro' USING AvroStorage();

grunt> nonSensSet = load '/d-spool-dir/Test-20160129-1401822-lake.avro' USING AvroStorage();

grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;grunt> outputSet = distinct outputSet;

grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,sensitiveSet::VIN,sensitiveSet::Birthdate,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;grunt> 

dump outputSet;

("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)

("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)

("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)

("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

.....

avatar
Master Mentor
@John Smith

your issue is with some reserved word in avro schema. Here's what I'm getting

grunt> nonSensSet = load '/user/root/Test-20160129-1401822-lake.avro' USING AvroStorage();
grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage();
grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;
grunt> outputSet = distinct outputSet;
grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption,sensitiveSet::VIN,sensitiveSet::Birthdate;
grunt> store outputSet into 'avrostorage' using AvroStorage();
2016-01-29 15:27:00,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116:
<line 6, column 0> Output Location Validation Failed for: 'hdfs://sandbox.hortonworks.com:8020/user/root/avrostorage More info to follow:
Pig Schema contains a name that is not allowed in Avro
Details at logfile: /root/pig-upload/pig_1454081182813.log

I saved the outputSet successfully as PigStorage(','); so I can't comment what the issue is. Something intricate about Avro.

avatar
Master Mentor

@John Smith I just read the AvroStorage wiki, they do say they have limited support for union schemas and record types, I guess the only thing I can comment on is that AvroStorage is limited in its functionality. Perhaps you'd want to look at other Storage Formats.

avatar
Expert Contributor

avatar
Master Mentor

@John Smith like I said, I tried with PigStorage and it worked fine, take a look at OrcStorage, which is pretty good columnar format for Pig, Hive and Spark (meaning you can query the same table from either tool natively), there are many formats, I can't recommend anything unless we know your use case. I do like Avro but sometimes it's driving me insane :). Try looking at the schemas, you can probably still get it working, I just don't have time to look at it. If you do find a solution, post here so we could all learn!

avatar
Expert Contributor

is there anything important in

Details at logfile: /root/pig-upload/pig_1454081182813.log

avatar
Master Mentor

same error as I pasted. @John Smith

avatar
Expert Contributor
/**  * Translates a name in a pig schema to an acceptable Avro name, or  * throws an error if the name can't be translated.  * @param name The variable name to translate.  * @param doubleColonsToDoubleUnderscores Indicates whether to translate  * double colons to underscores or throw an error if they are encountered.  * @return A name usable by Avro.  * @throws IOException If the name is not compatible with Avro.  */  private static String toAvroName(String name,  final Boolean doubleColonsToDoubleUnderscores) throws IOException {  if (name == null) {  return null;  }  if (doubleColonsToDoubleUnderscores) {  name = name.replace("::", "__");  }  if (name.matches("[A-Za-z_][A-Za-z0-9_]*")) {  return name;  } else {  throw new IOException(  "Pig Schema contains a name that is not allowed in Avro");  }  }

This is the check, and i dont have any characters <>

A-Za-z_][A-Za-z0-9_

defined as part of the schema in pig.

Btw i dont know why but everything i paste here some CODE/ and click to formate it into code its completely messed up, all newlines are removed... .

avatar
Master Mentor

@John Smith excellent, you went to the source code. It's actually [A-Za-z_][A-Za-z0-9_]* so plus asterisc. So if you did check and got no results of it, perhaps you discovered a bug? Once you're 100% sure, I suggest you file a Jira with Pig project.