Code Repositories

Find and share code repositories
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Guru
Repo Description

This demo is inspired by Ali's Hortonworks Twitter Demo

Purpose: Monitor Twitter stream for the procided Hastags & act on unexpected increases in tweet volume

  • Ingest: Listen for Twitter streams related to Hashtags input in NiFi Garden Hose (GetHTTP) processor
  • Processing:
    • Monitor tweets for unexpected volume
    • Volume thresholds managed in HBASE
  • Persistence:
    • HDFS (for future batch processing)
    • Hive (for interactive query)
    • HBase (for realtime alerts)
    • Solr/Banana (for search and reports/dashboards)
  • Refine:
    • Update threshold values based on historical analysis of tweet volumes
  • Demo setup:
    • Either download and start prebuilt VM
    • Start HDP 2.3 sandbox and run provided scripts to setup demo

Short steps / Breadcrumbs:

  1. SSH into the sandbox: ssh root@sandbox….
  2. There is an xml file in the nifi-template folder. Scp the file to your local disk.
  3. Start nifi: nifi.sh start
  4. Go to sandbox.hortonworks.com:9090/nifi & upload the template
  5. Add the Access Keys from your Twitter Developer account.
  6. Meanwhile, start solr & banana: sh ~/setup-scripts/restart_solr_banana.sh
  7. Start Storm topology: sh ~/twittertopology/runtopology.sh
  8. Once topology has started, hit play on the NiFi dashboard
  9. Go to sandbox.hortonworks.com:8983/banana

Option 1: Setup demo using prebuilt VM based on HDP 2.3 sandbox

  • Download VM from here. Import it into VMWare Fusion and start it up.
  • Start the demo by
cd /root/hdp_nifi_twitter_demo
./start-demo.sh#once storm topology is submitted, press control-C
#start Nifi processor
1. Using Browser, go to http://sandbox.hortonworks.com:<port#>/nifi 
2. Upload the XML file into NiFi templates section in the UI. The XML file is under /root/hdp_nifi_twitter_demo/nifi-template
  • Observe results in HDFS, Hive, Solr/Banana, HBase
  • Troubleshooting: check the Storm webUI for any errors and try resetting using below script:
./reset-demo.sh

Option 2: Setup demo via scripts on vanilla HDP 2.3 sandbox

These setup steps are only needed first time and may take upto 30min to execute (depending on your internet connection)

  • Pull latest code/scripts
git clone git@github.com:vedantja/hdp_nifi_twitter_demo.git

  • NiFi Garden Hose Processor requires you to have a Twitter account and obtain developer keys by registering an "app". Create a Twitter account and app and get your consumer key/token and access keys/tokens: https://apps.twitter.com > sign in > create new app > fill anything > create access tokens
  • Then enter the 4 values into the appropriate fields (see screenshot)
consumerKey
consumerSecret
oauth.accessToken
oauth.accessTokenSecret
  • Run below to setup demo (one time): start Ambari/HBase/Kafka/Storm and install maven, solr, banana -may take 10 min
cd /root/hdp22-twitter-demo
./setup-demo.sh
Run Twitter demo

Most of the below steps are optional as they were already executed by the setup script above but are useful to understand the components of the demo:

  • (Optional) Review the list of stock symbols whose Twitter mentiones we will be trackinghttp://en.wikipedia.org/wiki/List_of_S%26P_500_companies
  • (Optional) Generate securities csv from above page and review the securities.csv generated. The last field is the generated tweet volume threshold
/root/hdp_nifi_twitter_demo/fetchSecuritiesList/rungeneratecsv.sh
cat /root/hdp_nifi_twitter_demo/fetchSecuritiesList/securities.csv
  • (Optional) for future runs: you can add other stocks/hashtags to monitor to the csv (make sure no trailing spaces/new lines at the end of the file). Find these at http://mobile.twitter.com/trends
sed -i '1i$HDP,Hortonworks,Technology,Technology,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
sed -i '1i#hadoopsummit,Hadoop Summit,Hadoop,Hadoop,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
  • (Optional) Open connection to HBase via Phoenix and check you can list tables. Notice securities data was imported and alerts table is empty
/usr/hdp/current/phoenix-client/bin/sqlline.py  localhost:2181:/hbase-unsecure
!tables
select * from securities;
select * from alerts;
select * from dictionary;
!q
  • (Optional) check Hive table schema where we will store the tweets for later analysis
hive -e 'desc tweets_text_partition'
  • Start Storm Twitter topology to generate alerts into an HBase table for stocks whose tweet volume is higher than threshold this will also read tweets into Hive/HDFS/local disk/Solr/Banana. The first time you run below, maven will take 15min to download dependent jars
cd /root/hdp_nifi_twitter_demo
./start-demo.sh
#once storm topology is submitted, press control-C
  • (Optional) Other modes the topology could be started in future runs if you want to clean the setup or run locally (not on the storm running on the sandbox)
cd /root/hdp_nifi_twitter_demo/twitterstorm
./runtopology.sh runOnCluster clean
./runtopology.sh runLocally skipclean

To stop collecting tweets:

  • To stop producing tweets, hit the stop button on the template processor in the NiFi console.
  • kill the storm topology to stop processing tweets
storm kill Twittertopology
Repo Info
Github Repo URL https://github.com/vedantja/hdp_nifi_twitter_demo.git
Github account name vedantja
Repo name hdp_nifi_twitter_demo.git
1,215 Views
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎06-21-2016 08:59 PM
Updated by:
Contributors