Created on 11-26-2016 09:46 PM - edited 08-19-2019 03:18 AM
create table IF NOT EXISTS tweets_sentiment stored as orc as select tweet_id,
case when sum( polarity ) > 0 then 'positive' when sum( polarity ) < 0 then 'negative' else 'neutral' end as sentiment from l3 group by tweet_id;
Tihs query gives following error :-
Created 11-27-2016 12:13 AM
@rishabh jain , the error shows invalid JSON object. Can you have a look at /tmp/tweets_staging folder in HDFS and check whether the files you are getting are valid JSON files. I am getting these kind of records:
{"tweet_id":802654182859632640,"created_unixtime":1480202644444,"created_time":"Sat Nov 26 23:24:04 +0000 2016","lang":"it","displayname":"ivrivs","time_zone":"Pacific Time (US & Canada)","msg":"RT sunshinehems 130 Con lansia non si scherza ?? inutile che vi fate tutte ansiose solo per stare al centro dellattenzione Lansia ?????"} {"tweet_id":802654205374828544,"created_unixtime":1480202649812,"created_time":"Sat Nov 26 23:24:09 +0000 2016","lang":"it","displayname":"concettaconsoli","time_zone":"Greenland","msg":"RT DiegoFusaro Populista ?? nella neolingua chiunque difenda interessi che non siamo quelli dell??lite dominante finanziaria"}
and so on. Further, when you follow the tutorial and create different tables and views, try to do select * after every table you create, so that you are aware of how data is transforming.
Created 11-26-2016 09:48 PM
plz help @Mushtaq Rizvi
Created 11-27-2016 12:13 AM
@rishabh jain , the error shows invalid JSON object. Can you have a look at /tmp/tweets_staging folder in HDFS and check whether the files you are getting are valid JSON files. I am getting these kind of records:
{"tweet_id":802654182859632640,"created_unixtime":1480202644444,"created_time":"Sat Nov 26 23:24:04 +0000 2016","lang":"it","displayname":"ivrivs","time_zone":"Pacific Time (US & Canada)","msg":"RT sunshinehems 130 Con lansia non si scherza ?? inutile che vi fate tutte ansiose solo per stare al centro dellattenzione Lansia ?????"} {"tweet_id":802654205374828544,"created_unixtime":1480202649812,"created_time":"Sat Nov 26 23:24:09 +0000 2016","lang":"it","displayname":"concettaconsoli","time_zone":"Greenland","msg":"RT DiegoFusaro Populista ?? nella neolingua chiunque difenda interessi che non siamo quelli dell??lite dominante finanziaria"}
and so on. Further, when you follow the tutorial and create different tables and views, try to do select * after every table you create, so that you are aware of how data is transforming.
Created 11-27-2016 12:17 AM
ok i am trying to delete all docs from solr to restart fresh downloads but can not find a working solution for that. @Mushtaq Rizvi
Created 11-27-2016 12:19 AM
you don't have to delete anything in Solr. You are accessing data from HDFS, not Solr. Delete your HDFS folder /tmp/tweets_staging and then run Nifi workflow.
Created 11-27-2016 12:25 AM
I deleted all files from tweets_staging but solr still showing num docs the same number as before. @Mushtaq Rizvi
Created 11-27-2016 12:33 AM
Solr is not dependent on HDFS directory /tmp/tweets-staging, its getting data from Nifi and storing it in /etc/solr/data_dir, not HDFS directory.
You are getting confused between the usage of all the tools here.
Nifi is fetching data from Twitter API, sending it to Solr to view the streamed data in real-time to gather information. Nifi is also storing the data in HDFS in JSON format, we are creating tables in Hive referencing this new HDFS data to analyze the social sentiment. Lastly, we are using Zeppelin to visualize our dataset
Created 11-27-2016 12:54 AM
Use of solr is not mandatory here right? as its only for search and perform queries on tweets in json format? @Mushtaq Rizvi
Created 11-27-2016 12:55 AM
yes, you are right.
Created on 11-27-2016 01:05 AM - edited 08-19-2019 03:18 AM
When i reload ambari web page then i have to run this query always before running any query -
ADDJAR/usr/hdp/2.5.0.0-1245/hive2/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
Or i get the following error for query
select * from tweets_clean;
what is the cause for that ?