Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Contributor

I'm trying to run a Python udf in hive to make some sentiment analysis on twitter data captured with flume.

My twitter table code:

 

CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'hdfs://192.168.0.73:8020/user/flume/tweets'

My python code:

 

import hashlib
import sys

for line in sys.stdin:

line = line.strip()
(lang, text) = line.split('\t')

positive = set(["love", "good", "great", "happy", "cool", "best", "awesome", "nice", "helpful", "enjoyed"])
negative = set(["hate", "bad", "stupid", "terrible", "unhappy"])

words = text.split()
word_count = len(words)

positive_matches = [1 for word in words if word in positive]
negative_matches = [-1 for word in words if word in negative]

st = sum(positive_matches) + sum(negative_matches)

if st > 0:
print ('\t'.join([lang, text, 'positive', str(word_count)]))
elif st < 0:
print ('\t'.join([lang, text, 'negative', str(word_count)]))
else:
print ('\t'.join([lang, text, 'neutral', str(word_count)]))
And finally my Hive query:

 

ADD JAR /tmp/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar;
ADD FILE /tmp/my_py_udf.py;

SELECT
TRANSFORM (lang, text)
USING 'python my_py_udf.py'
AS (lang, text, sentiment, word_count)
FROM tweets
With this query I get error while closing operators.

If I use only one variable in the python UDF the query runs successfuly if I make:

text = line.replace('\n',' ')

May it be from the SerDe in the split('\t')?

Can anyone please help? I'm suck with this for the past 10 days...

2 REPLIES 2

Re: Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Contributor

Still stuck at this point... can anyone please help?

Re: Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Guru
Hi,

>> With this query I get error while closing operators.
what errors are you getting?

You are using JSONSerDe, why are you using "\t" to split words? ((lang, text) = line.split('\t')). I am not sure the default delimiter will be used to return the data.

Have you tried to print out the variable "line" to see what delimiters are used?