Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate

avatar
Expert Contributor

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

52 REPLIES 52

avatar
Expert Contributor

this is the schema:

outputSet: {nonSensSet::name: chararray,nonSensSet::customerId: chararray,sensitiveSet::VIN: chararray,sensitiveSet::Birthdate: chararray,nonSensSet::Mileage: chararray,nonSensSet::Fuel_Consumption: chararray}

avatar
Master Mentor

yes I got the same. One more thing to try is store as PigStorage, then in another script load that dataset using Pigstorage and store as Avro. I'm wondering if that will work. @John Smith

avatar
Expert Contributor

i did this:

outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);

and command below worked

store outputSet into 'avrostorage' using AvroStorage();

Output(s):

Successfully stored 100 records in: "file:///root/deploy-3/avrostorage"

thats strange, Apparently there is an issue when the relation was describtion as :

grunt> describe outputSet; outputSet: {nonSensSet::name: chararray,nonSensSet::customerId: chararray,sensitiveSet::VIN: chararray,sensitiveSet::Birthdate: chararray,nonSensSet::Mileage: chararray,nonSensSet::Fuel_Consumption: chararray}

but

/AvroStorageSchemaConversionUtilities.java contains code :

  if (doubleColonsToDoubleUnderscores) {
  name = name.replace("::", "__");
  }

There is still the same problem when i try to store using AvroStorage from the script provided:

Output(s): Failed to produce result in "/avro-dest/Test-20160129-1401822"

avatar
Master Mentor

@John Smith time to file a jira, great job investigating this.

avatar
Expert Contributor

could you please add that line before STORE

outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);

and execute my pig script in your environment?

avatar
Expert Contributor

I dont know what happened but i cant load any avro file in mapreduce mode ...

grunt> sensitiveSet = load '/t-spool-dir/Test-20160129-1401822-ttp.avro' USING AvroStorage(); 2016-01-29 17:06:00,668 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: null Details at logfile: /tmp/hsperfdata_hdfs/pig_1454087102249.log

Pig Stack Trace --------------- ERROR 1200: null

Failed to parse: null
  at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
  at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
  at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
  at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082)
  at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
  at org.apache.pig.Main.run(Main.java:565)
  at org.apache.pig.Main.main(Main.java:177)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.NullPointerException
  at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:298)
  at org.apache.pig.builtin.AvroStorage.getAvroSchema(AvroStorage.java:282)
  at org.apache.pig.builtin.AvroStorage.getSchema(AvroStorage.java:256)
  at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
  at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
  at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
  at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
  at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
  at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
  at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
  at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
  at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
  ... 16 more
================================================================================
/tmp/hsperfdata_hdfs/pig_1454087102249.log (END)

avatar
Expert Contributor

ops sorry my fault ... i dont have that source stored in HDFS ... time to stop debugging for today -)

avatar
Expert Contributor

Ok, i added the line

outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);

and successfully created output avro file using:

store outputSet into 'avrostorage' using AvroStorage();

When i try to store output file using code below it is failing

/10.0.1.47:8050 2016-01-29 17:24:39,600 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!

at this point i clearly have no idea what else i can do.

STORE outputSet INTO '/avro-dest/Test-20160129-1401822'  USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 'schema', '{"type":"record","name":"test","fields":[{"name":"name","type":"string","title":"Customer name","description":"non Surrogate Key for joining files on the BDP","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Delete","DataSensitivityLevel":"0","FieldPosition":"1"},{"name":"customerId","type":"string","title":"customer Id","description":"non sensitive field of customer Id","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Retain","DataSensitivityLevel":"0","FieldPosition":"2"},{"name":"VIN","type":"string","title":"Customer VIN","description":"Customer VIN","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Delete","DataSensitivityLevel":"1","FieldPosition":"3"},{"name":"Birthdate","type":"string","title":"Customer birthdate","description":"Customer birthdate","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Delete","DataSensitivityLevel":"1","FieldPosition":"4"},{"name":"Mileage","type":"string","title":"Customer mileage","description":"Customer mileage","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Delete","DataSensitivityLevel":"0","FieldPosition":"5"},{"name":"Fuel_Consumption","type":"string","title":"Customer fule consumption","description":"Customer fuel consumption","DataOwner":"Bank","ValidityDate":"2015.12.22","ValidityOption":"Delete","DataSensitivityLevel":"0","FieldPosition":"6"}]}'); 

avatar
Master Mentor

@John Smith the code work-around works, I was running in tez mode by the way.

outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);

store outputSet into 'avroout2' using AvroStorage();

Input(s): Successfully read 100 records (15099 bytes) from: "/user/root/Test-20160129-1401822-lake.avro" Successfully read 100 records (12703 bytes) from: "/user/root/Test-20160129-1401822-ttp.avro" Output(s): Successfully stored 100 records (7703 bytes) in: "hdfs://sandbox.hortonworks.com:8020/user/root/avroout2" grunt> 2016-01-29 18:04:19,978 [main] INFO org.apache.pig.Main - Pig script completed in 1 minute, 52 seconds and 249 milliseconds (112249 ms) 2016-01-29 18:04:19,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool 2016-01-29 18:04:20,008 [Thread-1] ERROR org.apache.pig.impl.io.FileLocalizer - java.io.IOException: Filesystem closed 2016-01-29 18:04:20,025 [Thread-23] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Shutting down Tez session org.apache.tez.client.TezClient@2c8b16b6 2016-01-29 18:04:20,025 [Thread-23] INFO org.apache.tez.client.TezClient - Shutting down Tez Session, sessionName=PigLatin:DefaultJobName, applicationId=application_1454090472993_0001 [root@sandbox pig-upload]# hdfs dfs -ls avroout2 Found 2 items -rw-r--r-- 3 root hdfs 0 2016-01-29 18:03 avroout2/_SUCCESS -rw-r--r-- 3 root hdfs 7703 2016-01-29 18:03 avroout2/part-v003-o000-r-00000.avro [root@sandbox pig-upload]# hdfs dfs -cat avroout2/part-v003-o000-r-00000.avro | less

avatar
Expert Contributor

yes, works for me also, but when i use

STORE outputSet INTO '/avro-dest/Test-20160129-1401822' USING org.apache.pig.piggybank.storage.avro.AvroStorage

and i define schema as part of the AvroStorage( schema ) ... it doesnt work ;-(((