Created 02-05-2016 01:02 PM
given_data = load '/clickstream/total_hitdata/05/hit_data.tsv' using PigStorage('\t'); filtered = FILTER given_data by ($133!=0); req_cols = foreach filtered generate GetYear(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as year:int,GetMonth(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as month:int,GetDay(ToDate((chararray)$25,'yyyy-MM-dd HH:mm:ss','GMT')) as day:int,($161-1400000000) as time,$343 as cust_id1:chararray,$344 as cust_id2:chararray,$256 as post_page_url,$466 as visit_num:int; gprd = group req_cols by (year,month,day,cust_id1,cust_id2,visit_num); lead_result = foreach gprd { C1 = order req_cols by time ASC; generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))); }; In lead_result relation i used 'lead' function according to my requirement. $8 is the column which is generated by lead function along with old schema.But i unable to cast to anytype.I am getting following error when try to cast to chararray with name my. <line 57, column 4> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "my:chararray", infered is ":NULL" The following is the schema overall. lead_result: {stitched::year: int,stitched::month: int,stitched::day: int,stitched::time: int,stitched::cust_id1: chararray,stitched::cust_id2: chararray,stitched::post_page_url: bytearray,stitched::visit_num: int,NULL}
Created 02-05-2016 01:45 PM
what's the output of the following before you apply Over?
org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0)
Created 02-05-2016 01:56 PM
@Artem Ervits Tq for reply
Before applying over schema would like
req_cols: {year: int,month: int,day: int,time: int,cust_id1: chararray,cust_id2: chararray,post_page_url: bytearray,visit_num: int}
After i applied lead with over we get one more column lets say that is "next_url_hit_time" ($8).
Where actually i am facing issue.see following code.
lead_result = foreach gprd { C1 = order req_cols by time ASC; generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:chararray); };
The above one generating error like
grunt> lead_result = foreach gprd { >> C1 = order req_cols by time ASC; >> generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:chararray); >> }; 2016-01-16 08:47:53,566 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "next_url_hit_time:chararray", infered is ":NULL" 2016-01-16 08:47:53,566 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-16 08:47:53,567 [main] ERROR org.apache.pig.tools.grunt.Grunt - Failed to parse: Pig script failed to parse: <line 12, column 14> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "next_url_hit_time:chararray", infered is ":NULL" at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:558) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
But the below one executing ok
lead_result = foreach gprd { C1 = order req_cols by time ASC; generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time); };
Except byteaaray it is not accepting any other type.Actually i need to cast the column generated by lead function.
Created 02-05-2016 02:03 PM
if you say it accepts bytearray, try casting the field? @Suresh Bonam
Created 02-05-2016 02:13 PM
Yeah artem i know casting ,But this column not accepting anything.see following.
lead_result = foreach gprd { C1 = order req_cols by time ASC; generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0))) as (year,month,day,time,cust_id1,cust_id2,page_url,visit_num,next_url_hit_time:bytearray); }; change_col_type = foreach lead_result generate next_url_hit_time as next_url:chararray; For the first time i am facing this issue.Bold one is completely new for me.
2016-01-16 09:04:54,994 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: declared is "next_url:chararray", infered is "next_url_hit_time:NULL" 2016-01-16 09:04:54,994 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2016-01-16 09:04:54,994 [main] ERROR org.apache.pig.tools.grunt.Grunt - Failed to parse: Pig script failed to parse: <line 19, column 18> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable field schema: declared is "next_url:chararray", infered is "next_url_hit_time:NULL" at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680) at org.apache.pig.PigServer.registerQuery(PigServer.java:623) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:558) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Created 02-05-2016 03:20 PM
@Suresh Bonam that's why I said try small steps, dump the output of the
org.apache.pig.piggybank.evaluation.Over(C1.time, 'lead', 0, 1, 1, 0)
see what happens, then continue with the next clause, etc.
Created 07-19-2017 10:46 PM
@Suresh Bonam, did you got solution for this?