Code Repositories
Find and share code repositories
Guru
Repo Description

This demo is inspired by Ali's Hortonworks Twitter Demo

Purpose: Monitor Twitter stream for the procided Hastags & act on unexpected increases in tweet volume

  • Ingest: Listen for Twitter streams related to Hashtags input in NiFi Garden Hose (GetHTTP) processor
  • Processing:
    • Monitor tweets for unexpected volume
    • Volume thresholds managed in HBASE
  • Persistence:
    • HDFS (for future batch processing)
    • Hive (for interactive query)
    • HBase (for realtime alerts)
    • Solr/Banana (for search and reports/dashboards)
  • Refine:
    • Update threshold values based on historical analysis of tweet volumes
  • Demo setup:
    • Either download and start prebuilt VM
    • Start HDP 2.3 sandbox and run provided scripts to setup demo

Short steps / Breadcrumbs:

  1. SSH into the sandbox: ssh root@sandbox….
  2. There is an xml file in the nifi-template folder. Scp the file to your local disk.
  3. Start nifi: nifi.sh start
  4. Go to sandbox.hortonworks.com:9090/nifi & upload the template
  5. Add the Access Keys from your Twitter Developer account.
  6. Meanwhile, start solr & banana: sh ~/setup-scripts/restart_solr_banana.sh
  7. Start Storm topology: sh ~/twittertopology/runtopology.sh
  8. Once topology has started, hit play on the NiFi dashboard
  9. Go to sandbox.hortonworks.com:8983/banana

Option 1: Setup demo using prebuilt VM based on HDP 2.3 sandbox

  • Download VM from here. Import it into VMWare Fusion and start it up.
  • Start the demo by
cd /root/hdp_nifi_twitter_demo
./start-demo.sh#once storm topology is submitted, press control-C
#start Nifi processor
1. Using Browser, go to http://sandbox.hortonworks.com:<port#>/nifi 
2. Upload the XML file into NiFi templates section in the UI. The XML file is under /root/hdp_nifi_twitter_demo/nifi-template
  • Observe results in HDFS, Hive, Solr/Banana, HBase
  • Troubleshooting: check the Storm webUI for any errors and try resetting using below script:
./reset-demo.sh

Option 2: Setup demo via scripts on vanilla HDP 2.3 sandbox

These setup steps are only needed first time and may take upto 30min to execute (depending on your internet connection)

  • Pull latest code/scripts
git clone git@github.com:vedantja/hdp_nifi_twitter_demo.git

  • NiFi Garden Hose Processor requires you to have a Twitter account and obtain developer keys by registering an "app". Create a Twitter account and app and get your consumer key/token and access keys/tokens: https://apps.twitter.com > sign in > create new app > fill anything > create access tokens
  • Then enter the 4 values into the appropriate fields (see screenshot)
consumerKey
consumerSecret
oauth.accessToken
oauth.accessTokenSecret
  • Run below to setup demo (one time): start Ambari/HBase/Kafka/Storm and install maven, solr, banana -may take 10 min
cd /root/hdp22-twitter-demo
./setup-demo.sh
Run Twitter demo

Most of the below steps are optional as they were already executed by the setup script above but are useful to understand the components of the demo:

  • (Optional) Review the list of stock symbols whose Twitter mentiones we will be trackinghttp://en.wikipedia.org/wiki/List_of_S%26P_500_companies
  • (Optional) Generate securities csv from above page and review the securities.csv generated. The last field is the generated tweet volume threshold
/root/hdp_nifi_twitter_demo/fetchSecuritiesList/rungeneratecsv.sh
cat /root/hdp_nifi_twitter_demo/fetchSecuritiesList/securities.csv
  • (Optional) for future runs: you can add other stocks/hashtags to monitor to the csv (make sure no trailing spaces/new lines at the end of the file). Find these at http://mobile.twitter.com/trends
sed -i '1i$HDP,Hortonworks,Technology,Technology,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
sed -i '1i#hadoopsummit,Hadoop Summit,Hadoop,Hadoop,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
  • (Optional) Open connection to HBase via Phoenix and check you can list tables. Notice securities data was imported and alerts table is empty
/usr/hdp/current/phoenix-client/bin/sqlline.py  localhost:2181:/hbase-unsecure
!tables
select * from securities;
select * from alerts;
select * from dictionary;
!q
  • (Optional) check Hive table schema where we will store the tweets for later analysis
hive -e 'desc tweets_text_partition'
  • Start Storm Twitter topology to generate alerts into an HBase table for stocks whose tweet volume is higher than threshold this will also read tweets into Hive/HDFS/local disk/Solr/Banana. The first time you run below, maven will take 15min to download dependent jars
cd /root/hdp_nifi_twitter_demo
./start-demo.sh
#once storm topology is submitted, press control-C
  • (Optional) Other modes the topology could be started in future runs if you want to clean the setup or run locally (not on the storm running on the sandbox)
cd /root/hdp_nifi_twitter_demo/twitterstorm
./runtopology.sh runOnCluster clean
./runtopology.sh runLocally skipclean

To stop collecting tweets:

  • To stop producing tweets, hit the stop button on the template processor in the NiFi console.
  • kill the storm topology to stop processing tweets
storm kill Twittertopology
Repo Info
Github Repo URL https://github.com/vedantja/hdp_nifi_twitter_demo.git
Github account name vedantja
Repo name hdp_nifi_twitter_demo.git
902 Views
Don't have an account?
Version history
Revision #:
1 of 1
Last update:
‎06-21-2016 08:59 PM
Updated by:
 
Contributors
Top Kudoed Authors