Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar

Prerequisites:

  1. Launch Sandbox on Azure
    1. VM Size: Minimum of A4 or A5
  2. A Twitter App
    1. You'll use the API credentials
    2. The "Application Details" don't matter

Prepare the Sandbox

Connect to SSH & Ambari

  1. Connect to the Sandbox using SSH
    1. or web console: http://<<ip>>:4200/
  2. Become root:
    sudo su -
  3. Reset the Ambari password:
    ambari-admin-password-reset
  4. Login to Ambari:
    1. http://<<ip>>:8080
    2. User: admin
  5. Before moving to the next steps, ensure all services on the left are started (green) or in maintenance mode (black).

Install NiFi

  1. In Ambari, Click "Actions" (bottom left) -> Add Service
  2. Choose NiFi and continue through the dialogs.
  3. You shouldn't need to change anything
  4. NiFi should now be accessible at http:<<ip>>:9090/nifi/

Tune Sandbox

The Sandbox is tuned to run on minimal hardware. We need to update the Hive, Tez & YARN configuration for our use case.

  1. This could take up to 15 minutes to complete:
    bash <(curl -sSL https://git.io/vVRPs)

Solr & Banana

Solr enables the ability to search across large corpuses of information through specialized indexing techniques.

Banana is a dashboard visualization tool for Solr.

  1. Download the Banana Dashboard
    curl -L https://git.io/vVRP3 -o /opt/hostname-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/default.json
  2. Update Solr to support Twitter's timestamp format
    curl -L https://git.io/vVRPz -o /opt/hostname-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
  3. Start Solr
    JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/hostname-hdpsearch/solr/bin/solr start -c -z localhost:2181
  4. Create Solr collection for tweets
    /opt/hostname-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 1 -rf 1
    
1,979 Views