Support Questions

lenovomi · ‎01-29-2016

Hi,

I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.

Using command below

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();

I got error:

2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)

or using command with argument

set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');

2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :

java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments

'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925

My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.

The problem is partially described on stackexchange also.

Thank you

lenovomi · ‎02-05-2016

fyi https://issues.apache.org/jira/browse/PIG-4793

org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')

This works.

View solution in original post

aervits · ‎01-29-2016

@John Smith

firstly set is a reserved word, change set to another alias, you can also refer to avro simply by AvroStorage no need to write out full package name. If all else fails, add register piggybank.jar command.

lenovomi · ‎01-29-2016

Hi, sorry the set was my mistypo here:

outSet = load 'hdfs:///CustomerData-20160128-1501807.avro' USING AvroStorage();

This command works, which is ODD, because whats the different when you call it as AvroStorage() or using full package path

org.apache.pig.piggybank.storage.avro.AvroStorage()

aervits · ‎01-29-2016

@John Smith AvroStorage may have different package now though I confirmed with javadoc and I was the same as yours, it may be packaged differently in HDP, classpath may differ, don't know for sure. Please accept this answer.

lenovomi · ‎01-29-2016

I have another issue with STORE now ....

STORE outputSet INTO 'hdfs:///avro-dest/-CustomerData-20160128-1501807'>> USING AvroStorage('no_schema_check', 'schema', '{"type":"record","name":"xxx","fields":[{"name":"name","type":"string","title":"Customer name","description":"non Surrogate Key for joining files on the BDP"}, ....]}');

error below:

2016-01-29 09:48:42,211 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:

<line 20, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'AvroStorage' with arguments '[no_schema_check, schema, {"type":"record",

lenovomi · ‎01-29-2016

ok so STORE works only with

org.apache.pig.piggybank.storage.avro.AvroStorage(.... )

But there are still issues while trying to write output file

2016-01-29 10:09:28,406 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1454023575813_0018
2016-01-29 10:09:28,406 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases outputSet
2016-01-29 10:09:28,406 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M:  C:  R: outputSet[19,12]
2016-01-29 10:10:03,931 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-01-29 10:10:03,931 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1454023575813_0018 has failed! Stop running all dependent jobs
2016-01-29 10:10:03,931 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-01-29 10:10:06,256 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-01-29 10:10:06,257 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.1.47:8050
2016-01-29 10:10:07,417 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-01-29 10:10:07,417 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.1.47:8050
2016-01-29 10:10:07,577 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2016-01-29 10:10:07,585 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

Failed Jobs:
JobId  Alias  Feature Message Outputs
job_1454023575813_0018  outputSet  DISTINCT  Message: Job failed!  hdfs:///avro-dest/CustomerData-20160128-1501807,

Output(s) Failed to produce result in "hdfs:///avro-dest/CustomerData-20160128-1501807"

Well i really dont understand whats going on here ... no proper documentation, for me random behavior its really hard to use the tool like that.

lenovomi · ‎01-29-2016

And now it says ... that i cant read data.. both files are there ... even previous run was Successful with reading the source data... Well im so desperate, this is like working with random turing machine. ;-(

How it can fail to read data .... i can easily dump both relations that read data from those input files.

Input(s):

Failed to read data from "hdfs:///CustomerData-20160128-1501807.avro"

Failed to read data from "hdfs:///CustomerData-20160128-1501807.avro"

Output(s):
Failed to produce result in "hdfs:///CustomerData-20160128-1501807"

lenovomi · ‎01-29-2016

Still failing ;-(

Failed Jobs:

JobId Alias Feature Message Outputs

job_1454023575813_0027 outputSet DISTINCT Message: Job failed! /CustomerData-20160128-1501807,

Input(s):

Successfully read 100 records from: "/CustomerData-20160128-1501807-l.avro"

Successfully read 100 records from: "/CustomerData-20160128-1501807-t.avro"

Output(s): Failed to produce result in "/avro-dest/CustomerData-20160128-1501807"

lenovomi · ‎01-29-2016

here is the full log: log

aervits · ‎01-29-2016

@John Smith I'll review and let you know.

Cloudera Community

Support Questions

AvroStorage with mapreduce and java.lang.RuntimeException: could not instantiate