Support Questions

Find answers, ask questions, and share your expertise

trying to get the most basic python UDFs working

avatar
Explorer

pig-0.17.0bin/pig  -x local

very basic UDF file:

#!/usr/bin/python3

from pig_util import outputSchema

@outputSchema("as:int")
def square(num):
if num == None:
return None
return ((num) * (num))

@outputSchema("word:chararray")
def concat(word):
return word + word

Exceedingly simple pig script:

REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs;

A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING PigStorage(',') AS (state:int,name:chararray);

B = FOREACH A GENERATE myFuncs.square(state) AS state, name;

 

If I do a "DUMP A" I get exactly what I would expect.

But, on a "DUMP B", I get a failed job:

java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE :
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE :
at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506)

grunt> Exception in thread "Thread-82" java.lang.NullPointerException: Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the return value of "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)" is null
at org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471)
2024-10-29 13:02:15,296 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map

?

 

 

1 ACCEPTED SOLUTION

avatar
Explorer

Here is the actual fix (it's actually quite loony):  don't wrap the name of the file in the REGISTER statement with single quotes.  That's it.

Catastrophic problems here:

1) Obviously not backwards compatible.

2) If this is a problem, why not just indicate that a) the format is wrong, or b) that a path that started with a single-quote did not yield a valid python file or c) anything understandable instead of getting in the middle of the M/R computation and throwing wacky (mkey?  nullPointer) errors.

View solution in original post

5 REPLIES 5

avatar
Community Manager

@mew Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our MapReduce experts @Stella Tang @vchalla @jeniferA  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Explorer

Honestly, at this point, I would probably accept any (less trivial than a "HelloWorld"--that is something that actually computes, not just returns a fixed string) Python UDF and the script that will work in pig17.  I feel like I'm just cutting and pasting the standard documented examples, and that's not close to working, which isn't giving me a great feeling.

avatar
Explorer

I also note that I can get Java UDFs to work; so its not a general UDF problem...it's something specific to Python. 

avatar
Explorer

Hmmm...so I tried rolling back to Pig 13, and somewhat troubling...but that totally worked.  On multiple different machines.  Perhaps something didn't get tested real well before release?

avatar
Explorer

Here is the actual fix (it's actually quite loony):  don't wrap the name of the file in the REGISTER statement with single quotes.  That's it.

Catastrophic problems here:

1) Obviously not backwards compatible.

2) If this is a problem, why not just indicate that a) the format is wrong, or b) that a path that started with a single-quote did not yield a valid python file or c) anything understandable instead of getting in the middle of the M/R computation and throwing wacky (mkey?  nullPointer) errors.