Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error Passing Relation to Python UDF in Pig

Highlighted

Error Passing Relation to Python UDF in Pig

Explorer

I am trying to pass relation to Python UDF in Pig. But it's throwing me an error. Following are my Pig Latin Script, Python Script, and error log,

//Pig Latin Script
REGISTER '/home/cloudera/jython-installer-2.7.0.jar'; REGISTER '/home/cloudera/Code.py' USING jython as myfunc; A = LOAD '/home/cloudera/Link.txt' as (line:chararray); B = FOREACH A GENERATE myfunc.codefunc(line);
//Python Script
import pandas as pd def count(A, crime): with open(A, 'r', encoding='UTF8') as fileA: data = fileA.read().lower() count = data.count(crime.lower()) return count def codefunc(A): crime = ['Rape', 'Murder', 'Extortion', 'Felony', 'Burglary', 'Property Damage', 'Arrest', 'Political Unrest', 'Civil Unrest', 'Solitication', 'Larceny', 'Abettor', 'Trafficking', 'Tresspasser', 'Robbery'] crimecount = {} for i in range(len(crime)): crimecount[crime[i]] = count(A, crime[i]) final_count = pd.DataFrame(list(crimecount.items()), columns = ['Crime', 'Value']) final_count['Percentage'] = 0 total_count = final_count['Values'].sum() for i in range(0, final_count.last_valid_index()+1): final_count['Percentage'][i] = float((final_count['Values'][i]/total_count)*100.0) final_count.sort_values(by=['Percentage'], ascending=False) final_count.to_csv('/home/cloudera/solution.csv', header=0)

 //Error Log

Pig Stack Trace --------------- ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: <line 2, column 23> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1660) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1633) at org.apache.pig.PigServer.registerQuery(PigServer.java:587) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:547) at org.apache.pig.Main.main(Main.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: <line 2, column 23> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1509) at org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9372) at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11051) at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10810) at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10159) at org.apache.pig.parser.LogicalPlanGenerator.flatten_generated_item(LogicalPlanGenerator.java:7488) at org.apache.pig.parser.LogicalPlanGenerator.generate_clause(LogicalPlanGenerator.java:17590) at org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:15982) at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15849) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 16 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve myfunc.codefunc using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653) at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:769) at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1506) ... 29 more ================================================================================

 

 

 

I have placed the link where dataset resides, and I have passed the link from Pig to Python. Python should go to that link and read the dataset and execute the code written. Python Code is absolutely fine. I am confident on that. But Pig is throwing me an error at relation, 'B'. Can anyone please help me. Thanks in advance.