Created on 10-12-2015 08:41 AM - edited 08-17-2019 02:05 PM
In this tutorial, we will learn how to use HDF to create a simple event processing flow by:
For a primer on HDF, you can refer to the materials here to get a basic background
Thanks to @bbende@hortonworks.com for his earlier blog post that helped make this tutorial possible
192.168.191.241 sandbox.hortonworks.com sandbox
Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry
ssh root@sandbox.hortonworks.com
ssh root@127.0.0.1 -p 2222
VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'` rm -rf /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI #sandbox service ambari restart #non sandbox service ambari-server restart
yum install -y lucidworks-hdpsearch sudo -u hdfs hadoop fs -mkdir /user/solr sudo -u hdfs hadoop fs -chown solr /user/solr
chown -R solr:solr /opt/lucidworks-hdpsearch/solr
su solr
cd /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/ mv default.json default.json.orig wget https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json
<str>EEE MMM d HH:mm:ss Z yyyy</str>
underParseDateFieldUpdateProcessorFactory
so it looks like below. This is done to allow Solr to recognize the timestamp format of tweets.vi /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
<processor> <arr name="format"> <str>EEE MMM d HH:mm:ss Z yyyy</str>
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z localhost:2181
/opt/lucidworks-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 1 -rf 1
exit
yum install -y ntp service ntpd stop ntpdate pool.ntp.org service ntpd start
sudo -u hdfs hadoop fs -chmod -R 777 /tmp/tweets_staging hive> create table if not exists tweets_text_partition( tweet_id bigint, created_unixtime bigint, created_time string, displayname string, msg string, fulltext string ) row format delimited fields terminated by "|" location "/tmp/tweets_staging";
Other things to try:
Learn more about Nifi expression language and how to get started building a custom Nifi processor: http://community.hortonworks.com/articles/4356/getting-started-with-nifi-expression-language-and.htm...
Created on 01-19-2016 02:10 AM
Thanks @Scott Shaw...updated
Created on 03-31-2016 08:24 PM
One caveat: In case you reboot (reset) your VM/Sandbox, you should enable 'ntpd' daemon to start on bootup. I had trouble with GetTwitter as mentioned in the post above, even after following the steps to add ntpd and enable it. However, in the meantime, I had to reboot, which turned it off. To enable it on system bootup, run this command:
chkconfig ntpd on
To make sure it was effective, you can run this command to make sure 'ntpd' is enabled in the run modes (2,3,4,5):
chkconfig --list | grep ntpd
Created on 07-05-2016 07:26 AM - edited 08-17-2019 02:05 PM
@Ali Bajwa I am getting below error when I tried to replicate same case in my sandbox VM Fusion.
Created on 07-05-2016 12:04 PM
@Ali Bajwa I have resolved it by adding proxy setting for nifi user.
hadoop.proxyuser.nifi.groups=*
hadoop.proxyuser.nifi.hosts=*
Created on 08-08-2016 11:39 PM
Anybody seen the following error when trying to create the tweet shard? And is there a known solution?
Unable to create core [tweets_shard1_replica1] Caused by: XML document structures must start and end within the same entity.
Created on 10-21-2016 09:13 PM
The following tutorial also follows a very similar flow:
Created on 02-15-2017 11:11 AM
Thank you @Ali Bajwa for good tutoral.
I am trying this example with a difference, My nifi is local and I try to put tweets in a remote Solr. Solr is in a VM that contains Hortonworks sandbox. Unfortunately I am getting this error on PutSolrContentStream processor:
PutSolrContentStream[id=f6327477-fb7d-4af0-ec32-afcdb184e545] Failed to send StandardFlowFileRecord[uuid=9bc39142-c02c-4fa2-a911-9a9572e885d0,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1487148463852-14, container=default, section=14], offset=696096, length=2589],offset=0,name=103056151325300.json,size=2589] to Solr due to org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://172.17.0.2:8983/solr/tweets_shard1_replica1; routing to connection_failure: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://172.17.0.2:8983/solr/tweets_shard1_replica1;
Could you help me?
thanks,
Shanghoosh