Member since
09-14-2015
47
Posts
89
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2244 | 09-22-2017 12:32 PM | |
11361 | 03-21-2017 11:48 AM | |
1024 | 11-16-2016 12:08 PM | |
1450 | 09-15-2016 09:22 AM | |
3328 | 09-13-2016 07:37 AM |
10-26-2017
11:40 PM
4 Kudos
Description Learn how to consume real-time data from the Satori RTM platform using NiFi. Background Satori is a cloud-based live platform that provides a publish-subscribe messaging service called RTM, and also makes available a set of free real-time data feeds as part of their Open Data Channels initiative: https://www.satori.com/docs/using-satori/overview https://www.satori.com/opendata/channels This article steps through how to consume from Satori's Open Data Channels in NiFi, using a custom NiFi processor. Note - the article assumes you already have a working version of NiFi up and running. Link to code on github: https://github.com/laurencedaluz/nifi-satori-bundle Installing the custom processor To create the required nar file, simply clone and build the following repo with maven: git clone https://github.com/laurencedaluz/nifi-satori-bundle.git
cd nifi-satori-bundle
mvn clean install
This will make the following .nar file under the nifi-satori-bundle-nar/target/ directory: nifi-satori-bundle-nar-<version>.nar Copy this file into the lib directory of your NiFi instance. If using HDF, it exists at: /usr/hdf/current/nifi/lib Restart NiFi for the nar to be loaded. Using the ConsumeSatoriRtm processor The ConsumeSatoriRtm accepts the following configurations: At a minimum, you will just need to provide the following configurations (which you can get directly from the satori open channels site):
Endpoint Appkey Channel In this example, I've chosen to consume from the 'big-rss' feed using the configurations provided here: https://www.satori.com/opendata/channels/big-rss That's it! after starting the ConsumeSatoriRtm process you will see data flowing: Additional Features
The processor also supports using Satori's Streamview filters, which allow you to provide SQL-like queries to select, transform, or aggregate messages from a subscribed channel:
In the 'big-rss' example above, the following filter configuration would limit the stream to messages containing the word "jobs". select * from `big-rss` where feedURL like '%jobs%'
The NiFi processor also supports batching of multiple messages into a single FlowFile, which will provide a new-line delimited list of messages in each file (based on a 'minimum batch size' configuration):
... View more
Labels:
09-22-2017
12:32 PM
2 Kudos
@Biswajit Chakraborty A simple way to do this is to use a ReplaceText processor after the TailFile. ReplaceText gives you the option to configure which Replacement Strategy to use. Using 'Prepend' will insert the replacement value at the start of each file or line (depending on what you've configured 'Evaluation Mode' to be): So a ReplaceText with the following configs will give you what you need: Replacement Value: <set your hostname or filename> Replacement Strategy: Prepend Evaluation Mode: Line-by-Line
... View more
03-21-2017
11:48 AM
4 Kudos
@jpetro416 Another possible solution until the Wait and Notify processors are available. You can use MergeContent configured to only merge when it hits 2 files (which may work for you because you don't need the final flow files and you just want a trigger in this case): This first flow trigger will be held in queue before the MergeContent, and will only be merged once the second flow trigger has arrived. The merged flowfile that comes out of the MergeContent will be your final trigger. Note, I used the following configurations to get this to work:
Back pressure on both 'success' flows going into MergeContent (set to only allow 1 file in queue at a time before the merge) MergeContent configured with:
Minimum Number of Entries = 2 Maximum Number of Entries = 2 Correlation Attribute Name = 'my-attribute' (I added this attribute to each UpdateAttribute processor - this would only be required if you wanted to have multiple triggers going into the MergeContent and you wanted to control the final trigger based on a correlation attribute) I've attached the xml flow template for your reference. mergecontent-trigger-wait.xml
... View more
12-22-2016
03:09 AM
1 Kudo
@sprakash That is correct - NiFi LDAP authentication only works over HTTPS. If you want to secure user access via LDAP in NiFi, you need to configure NiFi in HTTPS mode and remove HTTP access. Check out the following article for details of configuring LDAP auth via NiFi: https://community.hortonworks.com/articles/7341/nifi-user-authentication-with-ldap.html
... View more
12-15-2016
03:11 PM
3 Kudos
Auto-recovery in Ambari is a useful way of getting cluster components restarted automatically in the event that a component fails (without the need for human intervention). Ambari 2.4.0 introduced dynamic auto-recovery, which allows auto-start properties to be configured without needing an ambari-agent / ambari-server restart. Currently, the simplest way to manage the auto-recovery features within Ambari is via the REST API (documented within this article), although on-going work in the community will bring the feature to the UI: https://issues.apache.org/jira/browse/AMBARI-2330 Check Auto-Recovery Settings To check if auto-recovery is enabled for all components, run the following command on the Ambari server node: curl -u admin:<password> -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/clusters/<cluster_name>/components?fields=ServiceComponentInfo/component_name,ServiceComponentInfo/service_name,ServiceComponentInfo/category,ServiceComponentInfo/recovery_enabled Note, you will need to replace with your own <password> and <cluster_name>. The output of the above command will look something like this: ...
"items" : [
{
"href" : "http://localhost:8080/api/v1/clusters/horton/components/APP_TIMELINE_SERVER",
"ServiceComponentInfo" : {
"category" : "MASTER",
"cluster_name" : "horton",
"component_name" : "APP_TIMELINE_SERVER",
"recovery_enabled" : "false",
"service_name" : "YARN"
}
},
{
"href" : "http://localhost:8080/api/v1/clusters/horton/components/DATANODE",
"ServiceComponentInfo" : {
"category" : "SLAVE",
"cluster_name" : "horton",
"component_name" : "DATANODE",
"recovery_enabled" : "false",
"service_name" : "HDFS"
}
},
...
Notice the "recovery_enabled" : "false" flag on each component. Enable Auto-Recovery for HDP Components To enable auto-recovery for a single component (in this case HBASE_REGIONSERVER): curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(HBASE_REGIONSERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' To enable auto-recovery for multiple HDP components: curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(APP_TIMELINE_SERVER,DATANODE,HBASE_MASTER,HBASE_REGIONSERVER,HISTORYSERVER,HIVE_METASTORE,HIVE_SERVER,INFRA_SOLR,LIVY_SERVER,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR,MYSQL_SERVER,NAMENODE,NODEMANAGER,RESOURCEMANAGER,SECONDARY_NAMENODE,WEBHCAT_SERVER,ZOOKEEPER_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' Enable Auto-Recovery for HDF Components The process is the same for an Ambari managed HDF cluster, here is an example of enabling auto-recovery for the HDF services: curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(NIFI_MASTER,ZOOKEEPER_SERVER,KAFKA_BROKER,INFRA_SOLR,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' Using an older version of Ambari? If you're using an older version of Ambari (older than 2.4.0), check out the following ambari doc for details on how to enable auto-recovery via the ambari.properties file: https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components
... View more
11-16-2016
12:08 PM
1 Kudo
@Avijeet Dash I'd suggest using NiFi for this. You can read from the weather api using NiFi's GetHTTP processor, and use NiFi to process the data before loading it into another system (not sure what other components you're using but NiFi integrates with most systems, so you can use it to directly load the data into HDFS, HBase, Kafka, Cassandra, relational DBs etc..). Check out the "Fun_with_Hbase" template on the NiFi website to help you get started, it gets random data from an API call before loading into HBase. https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates Also check out this article which uses NiFi to pull data from the Google Finance API: https://community.hortonworks.com/content/kbentry/8422/visualize-near-real-time-stock-price-changes-using.html
... View more
10-07-2016
02:22 PM
1 Kudo
@jbarnett To answer your question - no, if you have SSSD configured you do not need to also configure core-site mapping with LDAP. Regarding your issue, it could be related to the space in your group name - could you try remove the space in the 'domain users' or test with a group that doesn't contain any spaces.
... View more
09-15-2016
09:22 AM
6 Kudos
@Sunil Mukati Your question is very broad, so if you need more info please be specific. Based on the question tags I'm going to assume you're asking about NiFi integration with Spark, Kafka, Storm, and Solr. The short answer is yes - we can integrate NiFi with other Apache Software 🙂 NiFi provides an easy way to stream data between different systems, and has in-built processors for dealing with most of the common Apache stack. Kafka NiFi has in-built processors to stream into and read data from Kafka: PutKafka GetKafka PublishKafka ConsumeKafka I'd suggest checking out the following tutorial to get you started: http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/ Solr NiFi has a PutSolrContentStream processor that allows you to stream data directly into a Solr index. Check out the following tutorial that uses NiFi to index twitter data directly into Solr and visualises it using banana: https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html Spark Spark doesn't supply a mechanism to have data pushed to it, and instead pulls from other sources. You can integrate NiFi directly with Spark Streaming by exposing an Output Port in NiFi that Spark can consume from. The following article explains how to set up this integration: https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html Note that typical streaming architecture involves NiFi pushing data to Kafka, and Spark Streaming (or Storm) reading from Kafka. Storm As above, typically NiFi is integrated with Storm with Kafka acting as the message buffer. The tutorial realtime event processing tutorial linked above covers the details of building a streaming application in NiFi Kafka & Storm.
... View more
09-13-2016
12:46 PM
@Vijay Kumar J NiFi is definitely feasible for production use, and it is perfectly suited for your MongoDB to HBase data movement use case. NiFi is a tool used for managing dataflow and integration between systems in an automated and configurable way. It allows you to stream, transform, and sort data and uses a drag-and-drop UI. Dealing with failures - NiFi is configurable - when you build your flow within NiFi you can determine how you want to handle failures. In your case, you could build a flow in NiFi that retries on failure, and sends out an email on failure (this is an example, how you want to handle failures for fetch and storing data can be configured however you need) Execute NiFi on an Hourly Basis - NiFi isn't like traditional data movement schedulers, and flows built using NiFi are treated as 'always-on' where data can be constantly streamed as it is received. That being said, NiFi provides the ability to schedule each processor if needed, so in your case you could have your GetMongo processor set to run once every hour, and your PutHBaseJSON processor to push data to HBase as soon as it is received from the GetMongo processor. Check out this tutorial for getting started and building your first flow: http://hortonworks.com/hadoop-tutorial/learning-ropes-apache-nifi/
... View more
09-13-2016
07:37 AM
3 Kudos
@Vijay Kumar J Have you considered using Apache NiFi for this? NiFi has inbuilt processors to work with data in both MongoDB and HBase. You could use NiFi's GetMongo processor followed by the PutHbaseJSON processor to move the data from MongoDB to HBase. Check out the following article for more info on using NiFi to interact with MongoDB: https://community.hortonworks.com/articles/53554/using-apache-nifi-100-with-mongodb.html
... View more