Created 01-29-2016 03:32 AM
Hi,
I have simple Pig script. Im trying to load avro file or directory that contains avro file using AvroStorage in Mapreduce mode. I tried almost all the combinations (hdfs://, / , hdfs://ip:port/file ... ) but nothing works.
Using command below
set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage ();
I got error:
2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' 2016-01-29 00:10:08,439 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-29 00:10:08,439 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error : java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null' at org.apache.pig.PigServer.openIterator(PigServer.java:925) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)
or using command with argument
set = load '/spool-dir/CustomerData-20160128-1501807/' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check');
2016-01-29 00:25:02,767 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sensitiveSet. Backend error :
java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[no_schema_check]' at org.apache.pig.PigServer.openIterator(PigServer.java:925
My samples are almost identical with the ones on the avrostorage documentation, but i really cant see where the problem is.
The problem is partially described on stackexchange also.
Thank you
Created 02-05-2016 12:56 AM
fyi https://issues.apache.org/jira/browse/PIG-4793
org.apache.pig.piggybank.storage.avro.AvroStorage is Deprecated, use AvroStorage('schema', '-d')
This works.
Created 01-29-2016 12:42 PM
Ok, waiting on your results! Thank you
Created 01-29-2016 12:45 PM
@Artem Ervits here you can find one more output - > sources were read successfully but output failed,
Created 01-29-2016 01:52 PM
it is failing to write output avro file but log says:
web-log says
Application Overview
User: hdfs
Name: PigLatin:pigMerger.pig
Application Type: MAPREDUCE
Application Tags:
YarnApplicationState: FINISHED
Queue: default
FinalStatus Reported by AM: SUCCEEDED
Started: Fri Jan 29 12:59:25 +0000 2016
Elapsed: 4mins, 29sec
Tracking URL: History
Log Aggregation Status SUCCEEDED
Diagnostics:
Created 01-29-2016 01:56 PM
can you paste sample dataset and pig script, I'll try to reproduce sometime today on my machine. It's hard to see the issue from the logs. @John Smith
Created 01-29-2016 02:21 PM
@Artem Ervits source files and pig script are included Data , thanks
Created 01-29-2016 01:55 PM
one more log:
Created 01-29-2016 02:44 PM
@John Smith I highly recommend you develop your scripts in Pig Grunt shell. This is what happened with your script as I was trying to execute it one by one.
grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage(); grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID; grunt> outputSet = distinct outputSet; grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,sensitiveSet::VIN,sensitiveSet::Birthdate,nonSensSet::Mileage,nonSensSet::Fuel_Consumption; 2016-01-29 14:41:59,228 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line 5, column 79> Invalid field projection. Projected field [sensitiveSet::VIN] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray. Details at logfile: /root/pig-upload/pig_1454078371113.log grunt> describe sensitiveSet; sensitiveSet: {Row_ID: long,name: chararray,customerId: chararray,Mileage: chararray,Fuel_Consumption: chararray}
Created 01-29-2016 03:10 PM
i do develop everything in grunt,... you are missing one line in that script
nonSensSet = load '/d-spool-dir/Test-20160129-1401822-lake.avro' USING AvroStorage();
Created 01-29-2016 02:45 PM
@John Smith birthdate also doesn't exist
<line 5, column 79> Invalid field projection. Projected field [sensitiveSet::Birthdate] does not exist in schema: sensitiveSet::Row_ID:long,sensitiveSet::name:chararray,sensitiveSet::customerId:chararray,sensitiveSet::Mileage:chararray,sensitiveSet::Fuel_Consumption:chararray,nonSensSet::Row_ID:long,nonSensSet::name:chararray,nonSensSet::customerId:chararray,nonSensSet::Mileage:chararray,nonSensSet::Fuel_Consumption:chararray. Details at logfile: /root/pig-upload/pig_1454078371113.log -- this works grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption;
Created 01-29-2016 03:06 PM
ok it was a problem with me, I copied the same file twice. nevermind that issue, I'm still looking.