Created on 10-12-2015 08:41 AM - edited 08-17-2019 02:05 PM
In this tutorial, we will learn how to use HDF to create a simple event processing flow by:
For a primer on HDF, you can refer to the materials here to get a basic background
Thanks to @bbende@hortonworks.com for his earlier blog post that helped make this tutorial possible
192.168.191.241 sandbox.hortonworks.com sandbox
Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry
ssh root@sandbox.hortonworks.com
ssh root@127.0.0.1 -p 2222
VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'` rm -rf /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI #sandbox service ambari restart #non sandbox service ambari-server restart
yum install -y lucidworks-hdpsearch sudo -u hdfs hadoop fs -mkdir /user/solr sudo -u hdfs hadoop fs -chown solr /user/solr
chown -R solr:solr /opt/lucidworks-hdpsearch/solr
su solr
cd /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/ mv default.json default.json.orig wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json
<str>EEE MMM d HH:mm:ss Z yyyy</str>
underParseDateFieldUpdateProcessorFactory
so it looks like below. This is done to allow Solr to recognize the timestamp format of tweets.vi /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
<processor> <arr name="format"> <str>EEE MMM d HH:mm:ss Z yyyy</str>
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z localhost:2181
/opt/lucidworks-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 1 -rf 1
exit
yum install -y ntp service ntpd stop ntpdate pool.ntp.org service ntpd start
sudo -u hdfs hadoop fs -chmod -R 777 /tmp/tweets_staging hive> create table if not exists tweets_text_partition( tweet_id bigint, created_unixtime bigint, created_time string, displayname string, msg string, fulltext string ) row format delimited fields terminated by "|" location "/tmp/tweets_staging";
Other things to try:
Learn more about Nifi expression language and how to get started building a custom Nifi processor: http://community.hortonworks.com/articles/4356/getting-started-with-nifi-expression-language-and.htm...
Created on 10-29-2015 08:21 AM
Awesome tutorial, Thanks for sharing 🙂
Created on 10-30-2015 08:17 AM
Great Tutorial/Article @Ali Bajwa. Great way to start get familiar with Nifi.
On question: Did you try and get working the Nifi Twitter Template so that the PutSolrContentStream connected to Solr Cloud instance as opposed to a standard/standalone solr instance.?
Created on 10-30-2015 01:26 PM
He did use the SolrCloud mode for the PutSolrContentStream. I used SolrStandalone, so it should work either way 🙂
Created on 10-30-2015 01:35 PM
From the HDF/NiFi standpoint, the only difference would be in a configuration switch for PutSolrContentStream:
Created on 10-30-2015 02:58 PM
Thanks @George Vetticaden! As @Jonas Straub and @Andrew Grande mentioned, in this example I used cloud mode (notice that Solr was started with -c -z arguments) but you can easily change the Solr processor to point to Solr standalone instance too
Created on 10-30-2015 03:02 PM
Yup. got it working..my zookeeper connect string needed to have zhroot location which was solr in my case.
Created on 10-30-2015 11:18 PM - edited 08-17-2019 02:05 PM
Cheers.. This Works.! Good Stuff Ali.. Keep it Coming..!
Created on 11-23-2015 02:12 AM
This demo will also work on VirtualBox but you will need to add port forwarding for port 9090. https://nsrc.org/workshops/2014/btnog/raw-attachment/wiki/Track2Agenda/ex-virtualbox-portforward-ssh...
Created on 01-19-2016 02:00 AM
@Ali Bajwa big props for updating Nifi to 0.4.1, can you update the step where you say to navigate to http://sandbox.hortonworks.com:9090/ to http://sandbox.hortonworks.com:9090/nifi?
Created on 01-19-2016 02:10 AM
Thanks @Artem Ervits! Updated it