Member since
09-14-2015
47
Posts
89
Kudos Received
11
Solutions
10-26-2017
11:40 PM
4 Kudos
Description Learn how to consume real-time data from the Satori RTM platform using NiFi. Background Satori is a cloud-based live platform that provides a publish-subscribe messaging service called RTM, and also makes available a set of free real-time data feeds as part of their Open Data Channels initiative: https://www.satori.com/docs/using-satori/overview https://www.satori.com/opendata/channels This article steps through how to consume from Satori's Open Data Channels in NiFi, using a custom NiFi processor. Note - the article assumes you already have a working version of NiFi up and running. Link to code on github: https://github.com/laurencedaluz/nifi-satori-bundle Installing the custom processor To create the required nar file, simply clone and build the following repo with maven: git clone https://github.com/laurencedaluz/nifi-satori-bundle.git
cd nifi-satori-bundle
mvn clean install
This will make the following .nar file under the nifi-satori-bundle-nar/target/ directory: nifi-satori-bundle-nar-<version>.nar Copy this file into the lib directory of your NiFi instance. If using HDF, it exists at: /usr/hdf/current/nifi/lib Restart NiFi for the nar to be loaded. Using the ConsumeSatoriRtm processor The ConsumeSatoriRtm accepts the following configurations: At a minimum, you will just need to provide the following configurations (which you can get directly from the satori open channels site):
Endpoint Appkey Channel In this example, I've chosen to consume from the 'big-rss' feed using the configurations provided here: https://www.satori.com/opendata/channels/big-rss That's it! after starting the ConsumeSatoriRtm process you will see data flowing: Additional Features
The processor also supports using Satori's Streamview filters, which allow you to provide SQL-like queries to select, transform, or aggregate messages from a subscribed channel:
In the 'big-rss' example above, the following filter configuration would limit the stream to messages containing the word "jobs". select * from `big-rss` where feedURL like '%jobs%'
The NiFi processor also supports batching of multiple messages into a single FlowFile, which will provide a new-line delimited list of messages in each file (based on a 'minimum batch size' configuration):
... View more
Labels:
12-15-2016
03:11 PM
3 Kudos
Auto-recovery in Ambari is a useful way of getting cluster components restarted automatically in the event that a component fails (without the need for human intervention). Ambari 2.4.0 introduced dynamic auto-recovery, which allows auto-start properties to be configured without needing an ambari-agent / ambari-server restart. Currently, the simplest way to manage the auto-recovery features within Ambari is via the REST API (documented within this article), although on-going work in the community will bring the feature to the UI: https://issues.apache.org/jira/browse/AMBARI-2330 Check Auto-Recovery Settings To check if auto-recovery is enabled for all components, run the following command on the Ambari server node: curl -u admin:<password> -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/clusters/<cluster_name>/components?fields=ServiceComponentInfo/component_name,ServiceComponentInfo/service_name,ServiceComponentInfo/category,ServiceComponentInfo/recovery_enabled Note, you will need to replace with your own <password> and <cluster_name>. The output of the above command will look something like this: ...
"items" : [
{
"href" : "http://localhost:8080/api/v1/clusters/horton/components/APP_TIMELINE_SERVER",
"ServiceComponentInfo" : {
"category" : "MASTER",
"cluster_name" : "horton",
"component_name" : "APP_TIMELINE_SERVER",
"recovery_enabled" : "false",
"service_name" : "YARN"
}
},
{
"href" : "http://localhost:8080/api/v1/clusters/horton/components/DATANODE",
"ServiceComponentInfo" : {
"category" : "SLAVE",
"cluster_name" : "horton",
"component_name" : "DATANODE",
"recovery_enabled" : "false",
"service_name" : "HDFS"
}
},
...
Notice the "recovery_enabled" : "false" flag on each component. Enable Auto-Recovery for HDP Components To enable auto-recovery for a single component (in this case HBASE_REGIONSERVER): curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(HBASE_REGIONSERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' To enable auto-recovery for multiple HDP components: curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(APP_TIMELINE_SERVER,DATANODE,HBASE_MASTER,HBASE_REGIONSERVER,HISTORYSERVER,HIVE_METASTORE,HIVE_SERVER,INFRA_SOLR,LIVY_SERVER,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR,MYSQL_SERVER,NAMENODE,NODEMANAGER,RESOURCEMANAGER,SECONDARY_NAMENODE,WEBHCAT_SERVER,ZOOKEEPER_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' Enable Auto-Recovery for HDF Components The process is the same for an Ambari managed HDF cluster, here is an example of enabling auto-recovery for the HDF services: curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(NIFI_MASTER,ZOOKEEPER_SERVER,KAFKA_BROKER,INFRA_SOLR,LOGSEARCH_LOGFEEDER,LOGSEARCH_SERVER,METRICS_COLLECTOR,METRICS_GRAFANA,METRICS_MONITOR)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}' Using an older version of Ambari? If you're using an older version of Ambari (older than 2.4.0), check out the following ambari doc for details on how to enable auto-recovery via the ambari.properties file: https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components
... View more