Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cannot cast the column generated by lead analytical function in pig

Cannot cast the column generated by lead analytical function in pig

Rising Star
given_data = load '/clickstream/total_hitdata/05/hit_data.tsv' using PigStorage('\t');
filtered = FILTER given_data by ($133!=0);
req_cols = foreach filtered generate GetYear(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as year:int,GetMonth(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as month:int,GetDay(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as day:int,($161-1400000000) as time,$343 as cust_id1:chararray,$344 as cust_id2:chararray,$256 as post_page_url,$466 as visit_num:int;
gprd = group req_cols by (year,month,day,cust_id1,cust_id2,visit_num);
lead_result = foreach gprd {
C1 = order req_cols by time ASC;
generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0)));
};



In lead_result relation i used 'lead' function according to my requirement. $8 is the column which is generated by lead function along with old schema.But i unable to cast to anytype.I am getting following error when try to cast to chararray with name my.


<line 57, column 4> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "my:chararray", infered is ":NULL"


The following is the schema overall.


lead_result: {stitched::year: int,stitched::month: int,stitched::day: int,stitched::time: int,stitched::cust_id1: chararray,stitched::cust_id2: chararray,stitched::post_page_url: bytearray,stitched::visit_num: int,NULL}


6 REPLIES 6

Re: Cannot cast the column generated by lead analytical function in pig

Mentor
@Suresh Bonam

what's the output of the following before you apply Over?

org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0)

Re: Cannot cast the column generated by lead analytical function in pig

Rising Star

@Artem Ervits Tq for reply

Before applying over schema would like

req_cols: {year: int,month: int,day: int,time: int,cust_id1: chararray,cust_id2: chararray,post_page_url: bytearray,visit_num: int}

After i applied lead with over we get one more column lets say that is "next_url_hit_time" ($8).

Where actually i am facing issue.see following code.

lead_result = foreach gprd {
C1 = order req_cols by time ASC;
generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:chararray);
};

The above one generating error like

grunt> lead_result = foreach gprd { >> C1 = order req_cols by time ASC; >> generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:chararray); >> }; 2016-01-16 08:47:53,566 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "next_url_hit_time:chararray", infered is ":NULL" 2016-01-16 08:47:53,566 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-16 08:47:53,567 [main] ERROR org.apache.pig.tools.grunt.Grunt - Failed to parse: Pig script failed to parse: <line 12, column 14> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "next_url_hit_time:chararray", infered is ":NULL" at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:558) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

But the below one executing ok

lead_result = foreach gprd {
C1 = order req_cols by time ASC;
generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time);
};

Except byteaaray it is not accepting any other type.Actually i need to cast the column generated by lead function.

Re: Cannot cast the column generated by lead analytical function in pig

Mentor

if you say it accepts bytearray, try casting the field? @Suresh Bonam

Re: Cannot cast the column generated by lead analytical function in pig

Rising Star

@Artem Ervits

Yeah artem i know casting ,But this column not accepting anything.see following.

lead_result = foreach gprd {
C1 = order req_cols by time ASC;
generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:bytearray);
};
change_col_type = foreach lead_result generate next_url_hit_time as next_url:chararray;

For the first time i am facing this issue.Bold one is completely new for me.

2016-01-16 09:04:54,994 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "next_url:chararray", infered is "next_url_hit_time:NULL" 2016-01-16 09:04:54,994 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-16 09:04:54,994 [main] ERROR org.apache.pig.tools.grunt.Grunt - Failed to parse: Pig script failed to parse: <line 19, column 18> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "next_url:chararray", infered is "next_url_hit_time:NULL" at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:558) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Re: Cannot cast the column generated by lead analytical function in pig

Mentor

@Suresh Bonam that's why I said try small steps, dump the output of the

org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0)

see what happens, then continue with the next clause, etc.

Re: Cannot cast the column generated by lead analytical function in pig

New Contributor

@Suresh Bonam, did you got solution for this?