Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig ERROR 1066

avatar
Contributor

employeeinfo.txt

salaryinfo.txt

I don't know why but pig is not running and giving me an error 1066 every time I run it. I have the data attached and this is the script I'm running.

Can anyone help?

a = load '/pigsample/Salaryinfo.csv' USING PigStorage(',');

b = load '/pigsample/Employeeinfo.csv' USING PigStorage(',');

c = filter b by $4 =='Male';

d = foreach c generate $0 as id:int, $1 as firstname:chararray, $2 as lastname:chararray, $4 as gender:chararray, $6 as city:chararray , $7 as country:chararray, $8 as countrycode:chararray;

e = foreach a generate $0 as iD:int, $1 as firstname:chararray, $2 as lastname:chararray, $3 as salary:double, ToDate($4, 'MM/dd/yyyy') as dateofhire, $5 as company:chararray;

f = join d by id, e by iD;

g = foreach f generate f.d::firstname as firstname;

dump g

--------------------------------------------**********************************************************----------------------------------------------------

this is the input and output I get from the shell with the describe and all

grunt> a = load '/pigsample/Salaryinfo.csv' USING PigStorage(',');

grunt> describe a Schema for a unknown.

grunt> b = load '/pigsample/Employeeinfo.csv' USING PigStorage(',');

grunt> describe b Schema for b unknown.

grunt> c = filter b by $4 =='Male';

2016-07-01 19:02:16,356 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

grunt> describe c

2016-07-01 19:02:21,611 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). Schema for c unknown.

grunt> d = foreach c generate $0 as id:int, $1 as firstname:chararray, $2 as lastname:chararray, $4 as gender:chararray, $6 as city:chararray , $7 as country:chararray, $8 as countrycode:chararray;

2016-07-01 19:02:35,684 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

grunt> describe d

2016-07-01 19:02:40,638 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

d: {id: int,firstname: chararray,lastname: chararray,gender: chararray,city: chararray,country: chararray,countrycode: chararray} grunt> e = foreach a generate $0 as iD:int, $1 as firstname:chararray, $2 as lastname:chararray, $3 as salary:double, ToDate($4, 'MM/dd/yyyy') as dateofhire, $5 as company:chararray;

2016-07-01 19:44:03,703 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). 2016-07-01 19:44:03,703 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> describe e

2016-07-01 19:44:09,159 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). 2016-07-01 19:44:09,159 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

e: {iD: int,firstname: chararray,lastname: chararray,salary: double,dateofhire: datetime,company: chararray} grunt> f = join d by id, e by iD;

2016-07-01 19:44:34,194 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s).

2016-07-01 19:44:34,194 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> describe f

2016-07-01 19:44:38,955 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). 2016-07-01 19:44:38,955 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

f: {d::id: int,d::firstname: chararray,d::lastname: chararray,d::gender: chararray,d::city: chararray,d::country: chararray,d::countrycode: chararray,e::iD: int,e::firstname: chararray,e::lastname: chararray,e::salary: double,e::dateofhire: datetime,e::company: chararray}

grunt> g = foreach f generate f.d::firstname as firstname;

2016-07-01 19:45:03,037 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). 2016-07-01 19:45:03,037 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> describe g

2016-07-01 19:45:08,432 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). 2016-07-01 19:45:08,432 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

g: {firstname: chararray}

grunt> dump g

2016-07-01 19:45:13,698 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s).

2016-07-01 19:45:13,698 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s). 2016-07-01 19:45:13,725 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,FILTER 2016-07-01 19:45:13,773 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-07-01 19:45:13,812 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}

2016-07-01 19:45:13,950 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-07-01 19:45:13,965 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: -1 2016-07-01 19:45:13,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POPackage(JoinPackager) 2016-07-01 19:45:13,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2016-07-01 19:45:13,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2016-07-01 19:45:13,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2016-07-01 19:45:13,997 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2016-07-01 19:45:14,155 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hwhdpm 2016-07-01 19:45:14,163 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hwhdpmaster02.centralus.cloudapp.azure.com/10.0.1.5:8050 2016-07-01 19:45:14,376 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-07-01 19:45:14,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-07-01 19:45:14,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2016-07-01 19:45:14,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2016-07-01 19:45:14,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=2500 2016-07-01 19:45:14,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2016-07-01 19:45:14,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2016-07-01 19:45:14,922 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.3.4.0-3485/pig/pig-0.15.0.2.3.4.0-3485-core-h2.jar to DistributedCache through /tmp/temp1278836613/tmp2008058395/pig-0.15.0.2.3.4.0-3485-core-h2.jar 2016-07-01 19:45:15,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.3.4.0-3485/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1278836613/tmp-244128717/automaton-1.11-8.jar 2016-07-01 19:45:15,193 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.3.4.0-3485/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1278836613/tmp-1145480432/antlr-runtime-3.4.jar 2016-07-01 19:45:15,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.3.4.0-3485/hadoop-mapreduce/joda-time-2.9.1.jar to DistributedCache through /tmp/temp1278836613/tmp530457831/joda-time-2.9.1.jar 2016-07-01 19:45:15,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2016-07-01 19:45:15,394 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2016-07-01 19:45:15,395 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2016-07-01 19:45:15,395 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2016-07-01 19:45:15,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2016-07-01 19:45:15,624 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hwhdpm 2016-07-01 19:45:15,625 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hwhdpmaster02.centralus.cloudapp.azure.com/10.0.1.5:8050 2016-07-01 19:45:15,934 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2016-07-01 19:45:16,009 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-07-01 19:45:16,009 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-07-01 19:45:16,037 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2016-07-01 19:45:16,042 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-07-01 19:45:16,042 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-07-01 19:45:16,045 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2016-07-01 19:45:16,419 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:2 2016-07-01 19:45:16,667 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1467387563416_0004 2016-07-01 19:45:16,839 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2016-07-01 19:45:17,136 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1467387563416_0004 2016-07-01 19:45:17,181 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://hwhdpmaster02.c 2016-07-01 19:45:17,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1467387563416_0004 2016-07-01 19:45:17,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a,b,c,d,e,f 2016-07-01 19:45:17,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[1,4],e[5,4],f[6,4],b[2,4],c[3,4],d[4,4],f[6,4] C: R: 2016-07-01 19:45:17,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2016-07-01 19:45:17,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1467387563416_0004] 2016-07-01 19:45:46,336 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2016-07-01 19:45:46,336 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1467387563416_0004] 2016-07-01 19:45:47,346 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-07-01 19:45:47,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1467387563416_0004 has failed! Stop running all dependent jobs 2016-07-01 19:45:47,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2016-07-01 19:45:47,517 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hwhdpm 2016-07-01 19:45:47,518 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hwhdpmaster02.centralus.cloudapp.azure.com/10.0.1.5:8050 2016-07-01 19:45:47,528 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server 2016-07-01 19:45:47,824 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Integer 2016-07-01 19:45:47,824 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2016-07-01 19:45:47,831 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.7.1.2.3.4.0-3485 0.15.0.2.3.4.0-3485 2016-07-01 19:45:14 2016-07-01 19:45:47 HASH_JOIN,FILTER Failed! Failed Jobs: JobId Alias Feature Message Outputs job_1467387563416_0004 a,b,c,d,e,f HASH_JOIN,MULTI_QUERY Message: Job failed! Input(s): Failed to read data from "/pigsample/Employeeinfo.csv" Failed to read data from "/pigsample/Salaryinfo.csv" Output(s): Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1467387563416_0004 -> null, null 2016-07-01 19:45:47,831 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2016-07-01 19:45:47,833 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias g Details at logfile: /home//pig_1467399695184.log grunt>

1 ACCEPTED SOLUTION

avatar
Contributor

Here is the solution to your problem @Dagmawi Mengistu

There are two issues over here,

ISSUE 1:

If you check your logs, then after relation "f", you get the "java.lang.ClassCastException".

Please find the updated steps below with explanation of how to resolve this error( Comments are marked with // prefix) -

a = load '/pigsample/Salaryinfo.csv' USING PigStorage(',');

b = load '/pigsample/Employeeinfo.csv' USING PigStorage(',');

c = filter b by $4 =='Male';

// In relation "d", carefully observer that I have type cast the field at index 0 to int, you need to explicitly do type casting like this in order to avoid the "java.lang.ClassCastException".

d = foreach c generate (int)$0 as id:int, $1 as firstname:chararray, $2 as lastname:chararray, $4 as gender:chararray, $6 as city:chararray , $7 as country:chararray, $8 as countrycode:chararray;

// Similarly in relation "e", we have to again explicitly type cast the field iD to int.

e = foreach a generate (int)$0 as iD:int, $1 as firstname:chararray, $2 as lastname:chararray, $3 as salary:double, ToDate($4, 'MM/dd/yyyy') as dateofhire, $5 as company:chararray;

// Relation "f" works perfectly now, doesn't throw any exceptions

f = join d by id, e by iD;

ISSUE 2 -

// In relation "g", you don't need to write f.d::firstname, this will throw org.apache.pig.backend.executionengine.ExecException".

You can directly reference the fields present in relation "f" of relation "d" like this -

g = foreach f generate d::firstname as firstname;

// Print output

DUMG g;

OUTPUT -

(Jonathan)

(Gary)

(Roger)

(Jeffrey)

(Steve)

(Lawrence)

(Billy)

(Joseph)

(Aaron)

(Steve)

(Brian)

(Robert)

Hope this helps 🙂

View solution in original post

1 REPLY 1

avatar
Contributor

Here is the solution to your problem @Dagmawi Mengistu

There are two issues over here,

ISSUE 1:

If you check your logs, then after relation "f", you get the "java.lang.ClassCastException".

Please find the updated steps below with explanation of how to resolve this error( Comments are marked with // prefix) -

a = load '/pigsample/Salaryinfo.csv' USING PigStorage(',');

b = load '/pigsample/Employeeinfo.csv' USING PigStorage(',');

c = filter b by $4 =='Male';

// In relation "d", carefully observer that I have type cast the field at index 0 to int, you need to explicitly do type casting like this in order to avoid the "java.lang.ClassCastException".

d = foreach c generate (int)$0 as id:int, $1 as firstname:chararray, $2 as lastname:chararray, $4 as gender:chararray, $6 as city:chararray , $7 as country:chararray, $8 as countrycode:chararray;

// Similarly in relation "e", we have to again explicitly type cast the field iD to int.

e = foreach a generate (int)$0 as iD:int, $1 as firstname:chararray, $2 as lastname:chararray, $3 as salary:double, ToDate($4, 'MM/dd/yyyy') as dateofhire, $5 as company:chararray;

// Relation "f" works perfectly now, doesn't throw any exceptions

f = join d by id, e by iD;

ISSUE 2 -

// In relation "g", you don't need to write f.d::firstname, this will throw org.apache.pig.backend.executionengine.ExecException".

You can directly reference the fields present in relation "f" of relation "d" like this -

g = foreach f generate d::firstname as firstname;

// Print output

DUMG g;

OUTPUT -

(Jonathan)

(Gary)

(Roger)

(Jeffrey)

(Steve)

(Lawrence)

(Billy)

(Joseph)

(Aaron)

(Steve)

(Brian)

(Robert)

Hope this helps 🙂