Community Articles

Find and share helpful community-sourced technical articles.
avatar

Highlights of integrating Apache NiFi with Apache Ambari/Ranger

Article credits: @Ali Bajwa, @Bryan Bende, @jluniya, @Yolanda M. Davis, @brosander

With the recently announced HDF 2.0, users are able to deploy an HDF cluster comprised of Apache NiFi, Apache Storm, Apache Kafka and other components. The mechanics of setting this up using Apache Ambari’s Install Wizard are outlined in the official documentation here and sample steps to automate the setup via Ambari blueprints are provided here. The goal of this article is to highlight some features NiFi administrators can leverage when using Ambari managed HDF 2.0 clusters vs using NiFi standalone

The article is divided into sections on how the integration helps administrators with HDF:

  1. Deployment
  2. Configuration
  3. Monitoring
  4. Security
  • Ease of Deployment

    • Users have the choice of deploying NiFi through Ambari install wizard or operationalize via blueprints automation
      • (For detailed steps, see links provided on above line)
    • Using the wizard, users can choose which nodes NiFi should be installed on. So users can:
    • Either choose NiFi hosts at time of cluster install

  • ...OR Add NiFi to existing host after the cluster is already installed and then start it. Note that in this case, ‘Zookeeper client’ must be installed on a host first before NiFi can be added to it

  • Ambari also allows users to configure which user/group NiFi runs as. This is done via the Misc tab which is editable either when cluster installed or when NiFi service is added to existing cluster for the first time.

  • Starting Ambari 2.4, users can also remove NiFi service from Ambari, but note that this does not remove the bits from the cluster.
  • NiFi can be stopped/started/configured across the cluster via both Ambari UI and also via Ambari’s REST API’s

  • The same can be done on individual hosts:

  • For easy access to NiFi UI, quick links are available. The benefit of using these is that the url is dynamically determined based on which users settings (e.g. what ports were specified and whether SSL enabled)

  • Ease of Configuration

    • Ambari allows configurations to be done once across the cluster. This is time saving because when setting up NiFi standalone, users need to manage configuration files on each node NiFi is running on
    • Most important NiFi config files are exposed via Ambari and are managed there (e.g. NiFi.properties, bootstrap.conf etc)
    • When going through the configuration process, there are a number of ways Ambari provides assistance for the admin:
    • Help text displayed, on hover, with property descriptions
    • Checkboxes instead of true/false values
    • User friendly labels and default values
    • ‘Computed’ values can be automatically handled (e.g. node address)
  • NiFi benefits from other standard Ambari config features like:
    • Update configs via Ambari REST API
    • Configuration history is available meaning that users can diff versions and revert to older version etc

  • Host-specific configurations can be managed using ‘Config groups’ feature where users can:
    • ‘override’ a value (e.g. max mem in the screenshot) and
    • create a subset group of hosts that will use that value

  • ‘Common’ configs are grouped together and exposed in the first config section (‘Advanced NiFi-ambari-config’) to allow configuration of commonly used properties:
    • Ports (nonSSL, SSL, protocol)
    • Initial and max memory (Xms, Xmx)
    • Repo default dir locations (provenance, content, db, flow file)
    • ‘Internal’ dir location - contains files NiFi will write to
    • ‘conf’ subdir for flow/tar.gz, authorizations.xml
    • ‘state’ subdir for internal state
    • Can change subdir names by prefixing the desired subdir name with ‘{NiFi_internal_dir}/’
  • Sensitive property key (used to encrypt sensitive property values)
  • Zookeeper znode for NiFi

  • Contents of NiFi.properties are exposed under ‘Advanced NiFi-properties’ as key/value pairs with helptext
    • Values replaced by Ambari shown surrounded by double braces e.g.{{ }} but can be overridden by end user
    • Properties can be updated or added to NiFi.properties via ‘Custom NiFi-properties’ and will get written to all nodes
    • It also handles properties whose values need to be ‘computed’ e.g.
    • ‘Node address’ fields are populated with each hosts own FQDN
    • Conditional logic handled:
    • When SSL enabled, populates NiFi.web.https.host/port
    • When SSL disabled, populates NiFi.web.http.host/port

  • Other property-based configuration files exposed as jinja templates (large text box)
    • Values that will be replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden by end user
    • Properties can be added/updated in the template and will get written to all nodes

  • Other xml based config files also exposed as jinja templates
    • Values replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden
    • Elements can be updated/added and will get written to all nodes

  • Note that config files written out with either 0400 or 0600 permissions
    • Why? Because some property files contain plaintext passwords
  • Ease of Debugging

    • Logsearch integration is included for ease of visualizing/debugging NiFi logs w/o connecting to system e.g. NiFi_app.log, NiFi_user.log, NiFi_bootstrap.log
    • Note: Logsearch component is Tech Preview in HDF 2.0
  • By default, monitors FATAL,ERROR,WARN messages (for all HDF services)
  • Can view/drill into errors at component level or host level
  • Can filter errors based on severity (fatal, error, warn, info, debug, trace)
  • Can exclude ‘noisy’ messages to find the needle in the haystack
  • Can ‘tail’ log from Logsearch UI
    • By clicking the ‘refresh’ button or ‘play’ button (to auto refresh every 10s)

  • Ease of Monitoring

    • NiFi Service check: Used to ensure that the NiFi UI has come up after restart. It can also be invoked via REST API for automation

  • NiFi alerts are host-level alerts that let admins know when a NiFi process goes down
    • Can temporarily be disabled by turning on maintenance mode

  • Alerts tab in Ambari allows users to disable or configure alerts (e.g. changing polling intervals)

  • Admins can choose to notifications email or SNMP through the alerts frameworks

  • AMS (Ambari Metrics) integration
    • When NiFi is installed via Ambari, an Ambari reporting task is auto-created in NiFi, pointing to the cluster’s AMS collector host/port (autodetected)

  • How is the task autocreated? By providing a configurable initial flow.xml (which can also be used to deploy any flows you like when NiFi is deployed) …..

  • ...and passing arguments (like AMS url) via bootstrap.conf. Advantage of doing it this way: if the collector is ever moved to a different host in the cluster, Ambari will let NiFi know (next time NiFi is restarted after the move)

  • As a result of the metrics integration, users get a dashboard for NiFi metrics in Ambari, such as:
    • Flowfiles sent/received
    • MBs read/written
    • JVM usage/thread counts

  • Dashboard widgets can:
    • be drilled into to see results from last 1,2,4 hours, day, week etc
    • export metrics data to csv or JSON

  • These same metrics can be viewed in Grafana dashboard:
    • Grafana can be accessed via quick link under ‘Ambari metrics’ service in Ambari
    • Pre-configured dashboards are available for each service but users can easily create custom dashboards for each component too

  • Ease of Security Setup

    • NiFi Identity mappings

    • These are used to map identities in DN pattern format (e.g. CN=Tom, OU=NiFi) into common identify strings (e.g. Tom@NiFi)
    • The patterns can be configured via ‘Advanced NiFi-properties’ section of Ambari configs. Sample values are provided via helptext

  • ActiveDirectory/LDAP integration

    • To enable users to login to NiFi using AD/LDAP credentials the ‘Advanced NiFi-login-identity-providers.xml’ section can be used to setup an ldap-provider for NiFi. Commented out sample xml fields are provided for the relevant settings e.g.
    • AD/LDAP url, search base, search filter, manager credentials

  • SSL for NiFi

    • Detailed steps for enabling SSL/identity mappings for Nifi available here
    • Options for SSL for NiFi:
    • 1. Use NiFi CA to generate self-signed certificates
    • good for quick start/demos
    • 2. Use your existing certificates
    • Usually done for production envs
  • SSL related configs are combined together in ‘Advanced NiFi-ambari-ssl-config’ config panel
    • Checkbox for whether SSL is enabled
    • NiFi CA fields - to configure certificate to be generated:
    • NiFi CA token(required)
    • NiFi CA DN prefix/suffix
    • NiFi CA Cert duration
    • NiFi CA host port
    • Checkbox for ‘NiFi CA Force Regenerate’
  • Keystore/truststore related fields - location/type of certificates:
    • Paths
    • Passwords
    • Types
  • Node identity fields:
    • Initial Admin Identity: long form of identity of Nifi admin user
    • Node Identities: long form of identities of nodes running Nifi
    • SSL Option 1 - using NiFi CA to generate new certificates through Ambari:
      • Just check “Enable SSL?” box and make sure CA token is set
      • Optionally update below as needed:
      • NiFi CA DN prefix/suffix
      • NiFi CA Cert duration
      • NiFi CA port
      • Check ‘NiFi CA Force Regenerate’ box
      • For changing certs after SSL already enabled
  • You can force regeneration of the certificates by either:
    • checking “NiFi CA Force Regenerate” checkbox
    • Or changing the passwords
  • You can also manually use tls-toolkit in standalone mode to generate new certificates outside of Ambari
  • SSL Option 2 - using your existing certificates:
    • Manually copy certificates to nodes
    • Populate keystore/truststore path/password/type fields
    • For keystore/trust paths that contain FQDN that need resolving:
    • use {NiFi_node_ssl_host} (This is useful for certs generated by NiFi-toolkit as they have the host’s FQDN in their name/path)
    • In both cases while enabling SSL, you will also need to populate the identity fields. This is to be able to login to NiFi after enabling SSL (assuming Ranger authorizer will not be used)
      • When setting these, first make sure that on all the nodes, authorizations.xml do not contain any policies. If it does, delete authorizations.xml from all nodes running NiFi. Otherwise, the identity related changes would not take effect.
      • On initial install there will not be any policies, but they will get created the first time the Identity fields are updated and NiFi restarted (i.e. if you entered incorrect values the first time, you will need to delete policies before re-entering the values)
    • Then save config changes and restart NiFi from Ambari to enable SSL
      • If NiFi CA option was used, this is the point at which certificates will get generated
    • Ranger integration with NiFi

      • Before installing Ranger there are some manual prerequisite steps:
      • Setup RDBMs to store Ranger policies
      • Install/setup Solr to store audits. In test/development environments, Ranger can re-use the Solr that comes with Logsearch/Ambari Infra services
    • Detailed steps for integrating Nifi with Ranger here
    • During Ranger install…
      • The backend RDBMS details are provided first via ‘Ranger Admin’ tab

    • The NiFi Ranger plugin can be enabled manage NiFi authorization policies in Ranger via ‘Ranger Plugin’ tab

    • Users/Groups can be synced from Active Directory/LDAP via ‘Ranger User Info’ tab

    • Ranger audits can be configured via ‘Ranger audit’ tab

    • After enabling Ranger and restarting NiFi, new config tabs appear under NiFi configs. NiFi/Ranger related configs can be accessed/updated here:

    • Ranger can be configured to communicate and retrieve resources from NiFi using a keystore (that has been imported into NiFi’s truststore)
      • Using a NiFi REST Client, Ranger is able to retrieve NiFi’s API endpoint information that can be secured via authorization
      • This list of resources are made available as auto-complete options when users are attempting to configure policies in Ranger
      • To communicate with NiFi over SSL a keystore and truststore should be available (with Ranger’s certificate imported into NiFi node truststores) for identification. The Owner for Certificate should be populated as well.
      • Once Ranger is identified NiFi will authorize Ranger to perform its resource lookup

    • Ranger policies can be created for NiFi (either via Ranger UI or API)
      • Create users in Ranger for NiFi users (either from certificate DNs, or import using AD/LDAP sync)
      • Decide which user has what type of access on what identifier
      • Default policy automatically created on first setup
      • Policy updates will be picked by Nifi after 30 seconds (by default)
      • Recommended approach:
      • Grant user access to modify the NiFi flow with a policy for /process-groups/<root-group-id> with RW
      • separate a policy for /provenance/process-groups/<root-group-id> (with each of the cluster node DNs) for read access

    • Ranger now track audits for NiFi (stored in standalone Solr or logsearch Solr)
      • For example: What user attempted what kind of NiFi access from what IP at what time?

    • Ranger also audits user actions related to NiFi in Ranger
      • For example: Which user created/updated NiFi policy at what time?

    • Kerberos for NiFi

      • HDF cluster with NiFi can be kerberized via standard Ambari security wizard (via MIT KDC or AD)
      • Also supported: NiFi installation on already kerberized HDF cluster
    • Detailed steps for enabling kerberos for HDF available here

    • Wizard will allow configuration of principal name and keytab path

    • NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari
    • During security wizard, NiFi.properties will automatically be updated:
      • NiFi.kerberos.service.principal
      • NiFi.kerberos.keytab.location
      • NiFi.kerberos.krb5.file
      • NiFi.kerberos.authentication.expiration
    • After enabling kerberos, login provider will also be switched to kerberos under the covers
      • Allows users to login via KDC credentials instead of importing certificates into the browser
    • Writing audits to kerberized Solr supported
    • After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos)

    • Try it out yourself!

      • Installation using official documentation: link
      • Automation to deploy clusters using Ambari blueprints: link
      • Enable SSL/Identity mappings for Nifi via Ambari: link
      • Enable Ranger authorization for Nifi: link
      • Enable Kerberos for HDF via Ambari: link
    8,833 Views