- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate
- Labels:
-
Apache Pig
Created ‎01-29-2016 03:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.
Using command below
set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();
I got error:
2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)
or using command with argument
set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');
2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :
java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925
My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.
The problem is partially described on stackexchange also.
Thank you
Created ‎02-05-2016 12:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
fyi https://issues.apache.org/jira/browse/PIG-4793
org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')
This works.
Created ‎01-29-2016 04:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
firstly set is a reserved word, change set to another alias, you can also refer to avro simply by AvroStorage no need to write out full package name. If all else fails, add register piggybank.jar command.
Created ‎01-29-2016 09:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, sorry the set was my mistypo here:
outSet = load 'hdfs:///CustomerData-20160128-1501807.avro' USING AvroStorage();
This command works, which is ODD, because whats the different when you call it as AvroStorage() or using full package path
org.apache.pig.piggybank.storage.avro.AvroStorage()
Created ‎01-29-2016 12:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Smith AvroStorage may have different package now though I confirmed with javadoc and I was the same as yours, it may be packaged differently in HDP, classpath may differ, don't know for sure. Please accept this answer.
Created ‎01-29-2016 09:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have another issue with STORE now ....
STORE outputSet INTO 'hdfs:///avro-dest/-CustomerData-20160128-1501807'>> USING AvroStorage('no_schema_check', 'schema', '{"type":"record","name":"xxx","fields":[{"name":"name","type":"string","title":"Customer name","description":"non Surrogate Key for joining files on the BDP"}, ....]}');
error below:
2016-01-29 09:48:42,211 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 20, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'AvroStorage' with arguments '[no_schema_check, schema, {"type":"record",
Created ‎01-29-2016 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok so STORE works only with
org.apache.pig.piggybank.storage.avro.AvroStorage(.... )
But there are still issues while trying to write output file
2016-01-29 10:09:28,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1454023575813_0018 2016-01-29 10:09:28,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases outputSet 2016-01-29 10:09:28,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: C: R: outputSet[19,12] 2016-01-29 10:10:03,931 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-01-29 10:10:03,931 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1454023575813_0018 has failed! Stop running all dependent jobs 2016-01-29 10:10:03,931 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2016-01-29 10:10:06,256 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-29 10:10:06,257 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.1.47:8050 2016-01-29 10:10:07,417 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-29 10:10:07,417 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.1.47:8050 2016-01-29 10:10:07,577 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2016-01-29 10:10:07,585 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
Failed Jobs: JobId Alias Feature Message Outputs job_1454023575813_0018 outputSet DISTINCT Message: Job failed! hdfs:///avro-dest/CustomerData-20160128-1501807, Output(s) Failed to produce result in "hdfs:///avro-dest/CustomerData-20160128-1501807"
Well i really dont understand whats going on here ... no proper documentation, for me random behavior its really hard to use the tool like that.
Created ‎01-29-2016 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And now it says ... that i cant read data.. both files are there ... even previous run was Successful with reading the source data... Well im so desperate, this is like working with random turing machine. ;-(
How it can fail to read data .... i can easily dump both relations that read data from those input files.
Input(s):
Failed to read data from "hdfs:///CustomerData-20160128-1501807.avro"
Failed to read data from "hdfs:///CustomerData-20160128-1501807.avro"
Output(s): Failed to produce result in "hdfs:///CustomerData-20160128-1501807"
Created ‎01-29-2016 10:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Still failing ;-(
Failed Jobs:
JobId Alias Feature Message Outputs
job_1454023575813_0027 outputSet DISTINCT Message: Job failed! /CustomerData-20160128-1501807,
Input(s):
Successfully read 100 records from: "/CustomerData-20160128-1501807-l.avro"
Successfully read 100 records from: "/CustomerData-20160128-1501807-t.avro"
Output(s): Failed to produce result in "/avro-dest/CustomerData-20160128-1501807"
Created ‎01-29-2016 11:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
here is the full log: log
Created ‎01-29-2016 12:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Smith I'll review and let you know.
