Member since
10-29-2024
5
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
217 | 11-05-2024 12:09 PM |
11-05-2024
12:09 PM
1 Kudo
Here is the actual fix (it's actually quite loony): don't wrap the name of the file in the REGISTER statement with single quotes. That's it. Catastrophic problems here: 1) Obviously not backwards compatible. 2) If this is a problem, why not just indicate that a) the format is wrong, or b) that a path that started with a single-quote did not yield a valid python file or c) anything understandable instead of getting in the middle of the M/R computation and throwing wacky (mkey? nullPointer) errors.
... View more
11-04-2024
06:06 AM
1 Kudo
Hmmm...so I tried rolling back to Pig 13, and somewhat troubling...but that totally worked. On multiple different machines. Perhaps something didn't get tested real well before release?
... View more
11-01-2024
10:05 AM
1 Kudo
I also note that I can get Java UDFs to work; so its not a general UDF problem...it's something specific to Python.
... View more
10-31-2024
08:46 AM
1 Kudo
Honestly, at this point, I would probably accept any (less trivial than a "HelloWorld"--that is something that actually computes, not just returns a fixed string) Python UDF and the script that will work in pig17. I feel like I'm just cutting and pasting the standard documented examples, and that's not close to working, which isn't giving me a great feeling.
... View more
10-29-2024
10:10 AM
pig-0.17.0bin/pig -x local very basic UDF file: #!/usr/bin/python3 from pig_util import outputSchema @outputSchema("as:int") def square(num): if num == None: return None return ((num) * (num)) @outputSchema("word:chararray") def concat(word): return word + word Exceedingly simple pig script: REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs; A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING PigStorage(',') AS (state:int,name:chararray); B = FOREACH A GENERATE myFuncs.square(state) AS state, name; If I do a "DUMP A" I get exactly what I would expect. But, on a "DUMP B", I get a failed job: java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506) grunt> Exception in thread "Thread-82" java.lang.NullPointerException: Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the return value of "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)" is null at org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471) 2024-10-29 13:02:15,296 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map ?
... View more
Labels:
- Labels:
-
MapReduce