Support Questions

Find answers, ask questions, and share your expertise

Hive Query error

avatar
Explorer

create table IF NOT EXISTS tweets_sentiment stored as orc as select tweet_id,

case when sum( polarity ) > 0 then 'positive' when sum( polarity ) < 0 then 'negative' else 'neutral' end as sentiment from l3 group by tweet_id;

Tihs query gives following error :-

9796-screenshot-3.png

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@rishabh jain , the error shows invalid JSON object. Can you have a look at /tmp/tweets_staging folder in HDFS and check whether the files you are getting are valid JSON files. I am getting these kind of records:

{"tweet_id":802654182859632640,"created_unixtime":1480202644444,"created_time":"Sat Nov 26 23:24:04 +0000 2016","lang":"it","displayname":"ivrivs","time_zone":"Pacific Time (US & Canada)","msg":"RT sunshinehems 130 Con lansia non si scherza ?? inutile che vi fate tutte ansiose solo per stare al centro dellattenzione Lansia ?????"}
{"tweet_id":802654205374828544,"created_unixtime":1480202649812,"created_time":"Sat Nov 26 23:24:09 +0000 2016","lang":"it","displayname":"concettaconsoli","time_zone":"Greenland","msg":"RT DiegoFusaro Populista ?? nella neolingua chiunque difenda interessi   che non siamo quelli dell??lite dominante finanziaria"}

and so on. Further, when you follow the tutorial and create different tables and views, try to do select * after every table you create, so that you are aware of how data is transforming.

View solution in original post

13 REPLIES 13

avatar
Explorer

plz help @Mushtaq Rizvi

avatar
Super Collaborator

@rishabh jain , the error shows invalid JSON object. Can you have a look at /tmp/tweets_staging folder in HDFS and check whether the files you are getting are valid JSON files. I am getting these kind of records:

{"tweet_id":802654182859632640,"created_unixtime":1480202644444,"created_time":"Sat Nov 26 23:24:04 +0000 2016","lang":"it","displayname":"ivrivs","time_zone":"Pacific Time (US & Canada)","msg":"RT sunshinehems 130 Con lansia non si scherza ?? inutile che vi fate tutte ansiose solo per stare al centro dellattenzione Lansia ?????"}
{"tweet_id":802654205374828544,"created_unixtime":1480202649812,"created_time":"Sat Nov 26 23:24:09 +0000 2016","lang":"it","displayname":"concettaconsoli","time_zone":"Greenland","msg":"RT DiegoFusaro Populista ?? nella neolingua chiunque difenda interessi   che non siamo quelli dell??lite dominante finanziaria"}

and so on. Further, when you follow the tutorial and create different tables and views, try to do select * after every table you create, so that you are aware of how data is transforming.

avatar
Explorer

ok i am trying to delete all docs from solr to restart fresh downloads but can not find a working solution for that. @Mushtaq Rizvi

avatar
Super Collaborator

you don't have to delete anything in Solr. You are accessing data from HDFS, not Solr. Delete your HDFS folder /tmp/tweets_staging and then run Nifi workflow.

avatar
Explorer

I deleted all files from tweets_staging but solr still showing num docs the same number as before. @Mushtaq Rizvi

avatar
Super Collaborator

Solr is not dependent on HDFS directory /tmp/tweets-staging, its getting data from Nifi and storing it in /etc/solr/data_dir, not HDFS directory.

You are getting confused between the usage of all the tools here.

Nifi is fetching data from Twitter API, sending it to Solr to view the streamed data in real-time to gather information. Nifi is also storing the data in HDFS in JSON format, we are creating tables in Hive referencing this new HDFS data to analyze the social sentiment. Lastly, we are using Zeppelin to visualize our dataset

avatar
Explorer

Use of solr is not mandatory here right? as its only for search and perform queries on tweets in json format? @Mushtaq Rizvi

avatar
Super Collaborator

yes, you are right.

avatar
Explorer

When i reload ambari web page then i have to run this query always before running any query -

ADDJAR/usr/hdp/2.5.0.0-1245/hive2/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

Or i get the following error for query

@Mushtaq Rizvi9798-screenshot-5.png

select * from tweets_clean;

what is the cause for that ?