Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎08-01-2017

Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

I'm trying to run a Python udf in hive to make some sentiment analysis on twitter data captured with flume.

My twitter table code:

 

CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'hdfs://192.168.0.73:8020/user/flume/tweets'

My python code:

 

import hashlib
import sys

for line in sys.stdin:

line = line.strip()
(lang, text) = line.split('\t')

positive = set(["love", "good", "great", "happy", "cool", "best", "awesome", "nice", "helpful", "enjoyed"])
negative = set(["hate", "bad", "stupid", "terrible", "unhappy"])

words = text.split()
word_count = len(words)

positive_matches = [1 for word in words if word in positive]
negative_matches = [-1 for word in words if word in negative]

st = sum(positive_matches) + sum(negative_matches)

if st > 0:
print ('\t'.join([lang, text, 'positive', str(word_count)]))
elif st < 0:
print ('\t'.join([lang, text, 'negative', str(word_count)]))
else:
print ('\t'.join([lang, text, 'neutral', str(word_count)]))
And finally my Hive query:

 

ADD JAR /tmp/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar;
ADD FILE /tmp/my_py_udf.py;

SELECT
TRANSFORM (lang, text)
USING 'python my_py_udf.py'
AS (lang, text, sentiment, word_count)
FROM tweets
With this query I get error while closing operators.

If I use only one variable in the python UDF the query runs successfuly if I make:

text = line.replace('\n',' ')

May it be from the SerDe in the split('\t')?

Can anyone please help? I'm suck with this for the past 10 days...

New Contributor
Posts: 2
Registered: ‎08-01-2017

Re: Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Still stuck at this point... can anyone please help?

Cloudera Employee
Posts: 177
Registered: ‎03-23-2015

Re: Hive twitter table running Python UDF gives Hive Runtime Error while closing operators

Hi,

>> With this query I get error while closing operators.
what errors are you getting?

You are using JSONSerDe, why are you using "\t" to split words? ((lang, text) = line.split('\t')). I am not sure the default delimiter will be used to return the data.

Have you tried to print out the variable "line" to see what delimiters are used?
Announcements