Created on 08-16-2015 08:02 PM - edited 09-16-2022 02:37 AM
Hi
I have completed my DE 575 exam today (8/16) at 6pm CST. There was a question where I have created an python UDF for HIVE
The program didn't work because sys.stdin didn't work.
I used
#! usr/bin/python ---> As mentioned in the exam
import os
for line in sys.stdin:
<Code>
When I run the Hive query the streaming didn't happen.
I got an 'error:2000' at MapReduce program from Hive Query.
I tried the same program standalone with 'cat' command on the environment and it didn't work.
I lost lot of time in debugging, and unable to solve the problem.
I would like the Cloudera team to let me know whether "import os" module is actually present or not.
Without that module the HIVE UDF in python doesn't work
Ref:
1) I performed operations as per this Blog
http://spryinc.com/blog/guide-user-defined-functions-apache-hive
2) Got an error after trying somewhat similar to this and got a problem somewhat similar
http://stackoverflow.com/questions/32032154/apache-hive-getting-error-while-using-python-udf
3) Get an IO error like the one below if I recall correctly
SCRIPT_IO_ERROR(20001, "An error occurred while reading or writing to your custom script. " + "It may have crashed with an error."),
regards
Suman
Created 08-17-2015 01:30 PM
Yes, both os and sys are available. As these are part of the standard library these come installed with python and have been verified to exist in the environment.
Created 08-17-2015 01:00 PM
Update in Subject line:
Error with Import sys and not Import os; Also in code I have imported Import sys
Created 08-17-2015 01:30 PM
Yes, both os and sys are available. As these are part of the standard library these come installed with python and have been verified to exist in the environment.
Created 10-17-2015 05:20 PM
I think you could have tried to use Java if that was an option.
I would actually prefer that to attempting it in Python.
I have generally had better luck doing SerDes and UDFs in Java.