Created on 04-02-2018 02:24 PM - edited 09-16-2022 01:42 AM
In the previous article, we saw how to stream tweets using NiFi, Kafka, Tranquility, Druid and Superset ...
You have to implement in that previous article, the part of Druid datasource and Nifi flow to carry on here.
But life already is hard enough, why not simplify it?
The idea here is to perform the same streaming but now integrating Nifi directly to Druid.
So, our new diagram would look like this:
As we saw in that article, we have Tranquility as an integrating factor between Kafka and Druid. Some people asked me: Why not use Kafka Indexing Service instead of Tranquility?
My answer: because Tranquility as a framework, can be used flexibly, doing integration of almost any component to the Druid.
Thus, on this nifi-druid integration, we will build a custom Nifi processor, which uses Tranquility to integrate data directly into Druid.
Ok, it’s time to hands on!
Let's divide this work into 3 parts:
1. Build druid processor
Here is how you can quickly check if you have them installed
$ mvn -version
$ java -version
If these ones are not installed:
https://maven.apache.org/install.html
cd <Home Dir>/nifi-druid-integration/fieldeng-nifi-druid-integration-master
mvn install
Once maven install is done you will have the nar file at the target directory with name nifi-druid-bundle-nar-0.0.1-SNAPSHOT.nar
cd <Home Dir>/nifi-druid-integration/fieldeng-nifi-druid-integration-master/nifi-druid-bundle-nar/target$ ls
nifi-druid-bundle-nar-0.0.1-SNAPSHOT.nar
2. Deploy it - It is a cinch.
Copy your nifi-druid-bundle-nar-0.0.1-SNAPSHOT.narfile for Nifi Libs: you can use something like that:
sudo scp -i yourkeyfile.pem /Users/tsantiago/Desktop/fieldeng-nifi-druid-integration-master/nifi-druid-bundle-nar/target/nifi-druid.nar centos@thiago-6.field.hortonworks.com:/usr/hdf/current/nifi/lib/
restart your nifi – and that’s it!
3. Set it up on Nifi
After restarting Nifi you will get a fresh processor:
Replace the last one step of that flow (putKafka) for PutDruidProcessor:
And then configure it:
You must fill this properties on Controller Service:
data_source: twitter_demo
zk_connect_string: thiago-2.field.hortonworks.com:2181,thiago-3.field.hortonworks.com:2181,thiago-4.field.hortonworks.com:2181
dimensions_list: tweet_id,created_unixtime,created_time,lang,location,displayname,time_zone,msg
aggregators_descriptor
[ { "type":"count", "name":"count" }, { "name":"value_sum", "type":"doubleSum", "fieldName":"value" }, { "fieldName":"value", "name":"value_min", "type":"doubleMin" }, { "type":"doubleMax", "name":"value_max", "fieldName":"value" } ]
Finally, push start on this new flow and see your superset being filled
Conclusion:
Now, you not only know how to build a custom nifi processor, but also how to integrate Nifi to Druid.
It's Important to say that playing Nifi straight to druid, we can lose some scalability and application resilience, once that in high volume of tweets, the integration between Nifi and druid can become a bottleneck.
However, if your workload is not heavy, maybe keep it simple can be the best option.
References:
https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html
https://github.com/hortonworks/fieldeng-nifi-druid-integration
Created on 01-24-2019 02:15 PM
Hi, the DruidTranquilityController gets stuck at enabling and never gets enabled, could you please let me know how to resolve that??
Created on 06-07-2019 06:40 AM
can you provide the XML of this flow?