Reply
Highlighted
New Contributor
Posts: 14
Registered: ‎04-10-2017

Error Passing Relation to Python UDF in Pig

I am trying to pass relation to Python UDF in Pig. But it's throwing me an error. Following are my Pig Latin Script, Python Script, and error log,

//Pig Latin Script
REGISTER '/home/cloudera/jython-installer-2.7.0.jar'; REGISTER '/home/cloudera/Code.py' USING jython as myfunc; A = LOAD '/home/cloudera/Link.txt' as (line:chararray); B = FOREACH A GENERATE myfunc.codefunc(line);
//Python Script
import pandas as pd def count(A, crime): with open(A, 'r', encoding='UTF8') as fileA: data = fileA.read().lower() count = data.count(crime.lower()) return count def codefunc(A): crime = ['Rape', 'Murder', 'Extortion', 'Felony', 'Burglary', 'Property Damage', 'Arrest', 'Political Unrest', 'Civil Unrest', 'Solitication', 'Larceny', 'Abettor', 'Trafficking', 'Tresspasser', 'Robbery'] crimecount = {} for i in range(len(crime)): crimecount[crime[i]] = count(A, crime[i]) final_count = pd.DataFrame(list(crimecount.items()), columns = ['Crime', 'Value']) final_count['Percentage'] = 0 total_count = final_count['Values'].sum() for i in range(0, final_count.last_valid_index()+1): final_count['Percentage'][i] = float((final_count['Values'][i]/total_count)*100.0) final_count.sort_values(by=['Percentage'], ascending=False) final_count.to_csv('/home/cloudera/solution.csv', header=0)

 //Error Log

Pig Stack Trace --------------- ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: <line 2, column 23> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1660) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1633) at org.apache.pig.PigServer.registerQuery(PigServer.java:587) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:547) at org.apache.pig.Main.main(Main.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: <line 2, column 23> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1509) at org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9372) at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11051) at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10810) at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10159) at org.apache.pig.parser.LogicalPlanGenerator.flatten_generated_item(LogicalPlanGenerator.java:7488) at org.apache.pig.parser.LogicalPlanGenerator.generate_clause(LogicalPlanGenerator.java:17590) at org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:15982) at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15849) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 16 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653) at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:769) at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1506) ... 29 more ================================================================================

 

 

 

I have placed the link where dataset resides, and I have passed the link from Pig to Python. Python should go to that link and read the dataset and execute the code written. Python Code is absolutely fine. I am confident on that. But Pig is throwing me an error at relation, 'B'. Can anyone please help me. Thanks in advance.

Announcements