Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3622 | 01-14-2017 01:52 AM | |
5632 | 12-07-2016 06:41 PM | |
6449 | 11-02-2016 06:56 PM | |
2121 | 10-19-2016 08:10 PM | |
5582 | 10-19-2016 08:05 AM |
09-27-2017
06:36 AM
Hi @Rashi Khanna, Is this functionallity still missing? if so, can you please suggest a workarond? Thanks
... View more
09-26-2016
06:47 AM
8 Kudos
Summary:
Automation/AMI to install HDP 2.5.x with Nifi 1.1.0 on any cloud and deploy commonly used demos via Ambari blueprints
Currently supported demos:
Nifi-Twitter
IoT (trucking) demo
Zeppelin notebooks
Vanilla HDF 2.1 (w/o any demos) Option 1: Deploy single node instances using AMIs
1. For deploying the above on single node setups on Amazon, AMI images are also available. To launch an instance using one of the AMIs, refer to steps below. A video that shows using these steps to launch the HDP 2.5.3 AMI is available here.
Login into EC2 dashboard using your credentials
Change your region to "N. California"
Click 'Launch instance'
Choose AMI: search for 081339556850 under Community AMIs (as shown in screenshot), select the desired AMI. For the HDP 2.5.x version of the AMI that has the demos pre-installed, select "HDP 2.5 Demo kit cluster" Choose instance type: select m4.2xlarge for HDP AMIs or m4.xlarge for HDF
Configure instance: leave defaults
Add storage: 100gb or larger (500gb preferred)
Tag: name your instance and add any tags you like
Configure Security Group: choose security group that opens all the ports (e.g. sg-1c53d279summit2015) or create new
While deploying choose an SSH key you have the .pem file for or create new
2. Once the instance comes up and Ambari server/agent are fully up, it will automatically start the services. You can monitor this by connecting to your instance via
SSH as ec2-user and tailing /var/log/hdp_startup.log
3. Once the service start call was made, you can login to Ambari UI (port 8080) to monitor progress. Note: if Ambari is not accessible make sure a) the security group you used has a policy for 8080 b) you waited enough time for Ambari to come up.
The password for 'admin' user of Ambari and Zeppelin is defaulted to your AWS account number. You can look this up using your EC2 dashboard as below
3. So 15-20 min after AWS shows the instance came up, you should see a fully started cluster. Note: in case any service does not come up, you can bring it up using 'Service Actions' menu in Ambari
Notes:
Once the cluster is up, it is recommended that you change the Ambari and Zeppelin admin passwords
The instance launched is EBS backed - so the VM can be stopped when not in use and restarted when needed. Just make sure to stop all HDP/HDF services via Ambari before stopping the instance via EC2 dashboard. What gets installed?
HDP 2.5.x with below vanilla components
IotDemo demo service - allows users to stop/start Iot Demo, open webUI and generate events
Demo Ambari service for Solr
This service will pre-configure Solr/Banana for Twitter demo
Demo Ambari service for Nifi 1.1
The script auto-deploys the specified flow - by default, it deploys the the Twitter flow but this is overridable
Even though the flow is deployed, you will need to set processors that contain env-specific details e.g. you will need to enter Twitter key/secret in GetTwitter processor
IoT Trucking demo steps Once the instance is up, you can follow the below steps to start the trucking demo. Video here - In Ambari, open 'IotDemo UI' using quicklink:
- In IotDemo UI, click "Deploy the Storm Topology"
- After 30-60 seconds, the topology will be deployed. Confirm using the Storm View in Ambari:
- Click "Truck Monitoring Application" link in 'IotDemo UI' to open the monitoring app showing an empty map.
- Click 'Nifi Data Flow' in In IotDemo UI to launch Nifi and then double click on 'Iot Trucking demo' processor group. Then right click on both PublishKafka_0_10 processors > Configure > Properties. Confirm that the 'Kafka Broker' hostname/port is correctly populated. The flow should already be started so no other action needed.
- In Ambari, click "Generate Events" to simulate 50 events (this can be configured)
- Switch back to "Truck Monitoring Application" in IotDemo UI and after 30s the trucking events will appear on screen
- Explore Storm topology using Storm View in Ambari
Nifi Sentiment demo Next you can follow the below steps to start the Nifi sentiment demo. Video of these steps available here
- Open Nifi UI using Quicklinks in Ambari
- Double click "Twitter Dashboard" to open this process group:
- Right click "Grab Garden Hose" > Properties and enter your Twitter Consumer key/secret and Access token/secret. Optionally change the 'Terms to filter on' as desired. Once complete, start the flow.
- Use Banana UI quicklink from Ambari to open Twitter dashboard
- An empty dashboard will initially appear. After a minute, you should start seeing charts appear
Zeppelin demos
- Open Zeppelin UI via Quicklink
- Login as admin. Password is same as Ambari password
- Demo notebooks will appear. Open the first notebook and walk through each cell.
Option 2: To install HDP (including demos) or HDF using scripts
Pre-reqs:
One or more freshly installed CentOS/RHEL 6 or 7 VMs on your cloud of choice
Do not run this script on VMs running an existing HDP cluster or sandbox
If planning to install ‘IoT Demo’ make sure you allocate enough memory - especially if also deploying other demos
16GB or more of RAM is recommended if using single node setup
The sample script should only be used to create test/demo clusters
Default password for Ambari and Zeppelin admin users is BadPass#1
Override by exporting ambari_password prior to running the script
Steps:
1. This step is only needed if installing a multi-node cluster. After choosing a host where you would like Ambari-server to run, first prepare the other hosts. Run this on all hosts
where Ambari-server will not be running to run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of ambari-server host>
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2. Run remaining steps on host
where Ambari-server is to be installed. These run pre-reqs and install Ambari-server and deploy demos requested
a)
To install HDP 2.5.x (Ambari 2.4.1/Java 😎 - including Solr/Nifi 1.0.0 via Ambari and deploy a Nifi flow:
export host_count=1 #set to number of nodes in your cluster (including Ambari-server node)
export hdp_ver=2.5
export install_nifidemo=true
export install_iotdemo=true
curl -sSL https://gist.github.com/abajwa-hw/3f2e211d252bba6cad6a6735f78a4a93/raw | sudo -E sh
After 5-10 min, you should get a message saying the blueprint was deployed. At this point you can open Ambari UI (port 8080) and monitor the cluster install
Note: if you installed iotdemo on a multi-node cluster, there maybe some manual steps required (e.g. moving storm jars or setting up latest Storm view). See here for more info: https://github.com/hortonworks-gallery/iotdemo-service/tree/hdp25#post-install-manual-steps
b)
To install HDP 2.4 (Ambari 2.4.1/java 😎 - including IoTDemo, plus Solr/Nifi 1.0.0 via Ambari and deploy Nifi Twitter flow run below:
export host_count=1 #set to number of nodes in your cluster (including Ambari-server node)
export hdp_ver=2.4
export install_iotdemo=true
export install_nifidemo=true
curl -sSL https://gist.github.com/abajwa-hw/3f2e211d252bba6cad6a6735f78a4a93/raw | sudo -E sh
c)
To install vanilla HDF 2.1 cluster, you can use the script/steps below:
https://community.hortonworks.com/articles/56849/automate-deployment-of-hdf-20-clusters-using-ambar.html
Note this does not install any of the demos, just a vanilla HDF 2.1 cluster
Deployment
After 5-10min, you should get a message saying the blueprint was deployed. At this point you can open Ambari UI (port 8080) and monitor the cluster install. (Note make sure the port was opened). Default password is BadPass#1
What gets installed?
refer to previous 'What gets installed' section
... View more
Labels:
10-30-2018
12:35 AM
CLOUD.HORTONWORKS.COM was just an example...you can change this to whatever you like. If you are using AD, you would probably want to set it to your AD domain
... View more
09-23-2016
02:53 AM
15 Kudos
Highlights of integrating Apache NiFi with Apache Ambari/Ranger
Article credits: @Ali Bajwa, @Bryan Bende, @jluniya, @Yolanda M. Davis, @brosander
With the recently announced HDF 2.0, users are able to deploy an HDF cluster comprised of Apache NiFi, Apache Storm, Apache Kafka and other components. The mechanics of setting this up using Apache Ambari’s Install Wizard are outlined in the official documentation here and sample steps to automate the setup via Ambari blueprints are provided here. The goal of this article is to highlight some features NiFi administrators can leverage when using Ambari managed HDF 2.0 clusters vs using NiFi standalone
The article is divided into sections on how the integration helps administrators with HDF:
Deployment
Configuration
Monitoring
Security
Ease of Deployment
Users have the choice of deploying NiFi through Ambari install wizard or operationalize via blueprints automation
(For detailed steps, see links provided on above line)
Using the wizard, users can choose which nodes NiFi should be installed on. So users can:
Either choose NiFi hosts at time of cluster install
...OR Add NiFi to existing host after the cluster is already installed and then start it. Note that in this case, ‘Zookeeper client’ must be installed on a host first before NiFi can be added to it
Ambari also allows users to configure which user/group NiFi runs as. This is done via the Misc tab which is editable either when cluster installed or when NiFi service is added to existing cluster for the first time.
Starting Ambari 2.4, users can also remove NiFi service from Ambari, but note that this does not remove the bits from the cluster.
NiFi can be stopped/started/configured across the cluster via both Ambari UI and also via Ambari’s REST API’s
The same can be done on individual hosts:
For easy access to NiFi UI, quick links are available. The benefit of using these is that the url is dynamically determined based on which users settings (e.g. what ports were specified and whether SSL enabled)
Ease of Configuration
Ambari allows configurations to be done once across the cluster. This is time saving because when setting up NiFi standalone, users need to manage configuration files on each node NiFi is running on
Most important NiFi config files are exposed via Ambari and are managed there (e.g. NiFi.properties, bootstrap.conf etc)
When going through the configuration process, there are a number of ways Ambari provides assistance for the admin:
Help text displayed, on hover, with property descriptions
Checkboxes instead of true/false values
User friendly labels and default values
‘Computed’ values can be automatically handled (e.g. node address)
NiFi benefits from other standard Ambari config features like:
Update configs via Ambari REST API
Configuration history is available meaning that users can diff versions and revert to older version etc
Host-specific configurations can be managed using ‘Config groups’ feature where users can:
‘override’ a value (e.g. max mem in the screenshot) and
create a subset group of hosts that will use that value
‘Common’ configs are grouped together and exposed in the first config section (‘Advanced NiFi-ambari-config’) to allow configuration of commonly used properties:
Ports (nonSSL, SSL, protocol)
Initial and max memory (Xms, Xmx)
Repo default dir locations (provenance, content, db, flow file)
‘Internal’ dir location - contains files NiFi will write to
‘conf’ subdir for flow/tar.gz, authorizations.xml
‘state’ subdir for internal state
Can change subdir names by prefixing the desired subdir name with ‘{NiFi_internal_dir}/’
Sensitive property key (used to encrypt sensitive property values)
Zookeeper znode for NiFi
Contents of NiFi.properties are exposed under ‘Advanced NiFi-properties’ as key/value pairs with helptext
Values replaced by Ambari shown surrounded by double braces e.g.{{ }} but can be overridden by end user
Properties can be updated or added to NiFi.properties via ‘Custom NiFi-properties’ and will get written to all nodes
It also handles properties whose values need to be ‘computed’ e.g.
‘Node address’ fields are populated with each hosts own FQDN
Conditional logic handled:
When SSL enabled, populates NiFi.web.https.host/port
When SSL disabled, populates NiFi.web.http.host/port
Other property-based configuration files exposed as jinja templates (large text box)
Values that will be replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden by end user
Properties can be added/updated in the template and will get written to all nodes
Other xml based config files also exposed as jinja templates
Values replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden
Elements can be updated/added and will get written to all nodes
Note that config files written out with either 0400 or 0600 permissions
Why? Because some property files contain plaintext passwords
Ease of Debugging
Logsearch integration is included for ease of visualizing/debugging NiFi logs w/o connecting to system e.g. NiFi_app.log, NiFi_user.log, NiFi_bootstrap.log
Note: Logsearch component is Tech Preview in HDF 2.0
By default, monitors FATAL,ERROR,WARN messages (for all HDF services)
Can view/drill into errors at component level or host level
Can filter errors based on severity (fatal, error, warn, info, debug, trace)
Can exclude ‘noisy’ messages to find the needle in the haystack
Can ‘tail’ log from Logsearch UI
By clicking the ‘refresh’ button or ‘play’ button (to auto refresh every 10s)
Ease of Monitoring
NiFi Service check: Used to ensure that the NiFi UI has come up after restart. It can also be invoked via REST API for automation
NiFi alerts are host-level alerts that let admins know when a NiFi process goes down
Can temporarily be disabled by turning on maintenance mode
Alerts tab in Ambari allows users to disable or configure alerts (e.g. changing polling intervals)
Admins can choose to notifications email or SNMP through the alerts frameworks
AMS (Ambari Metrics) integration
When NiFi is installed via Ambari, an Ambari reporting task is auto-created in NiFi, pointing to the cluster’s AMS collector host/port (autodetected)
How is the task autocreated? By providing a configurable initial flow.xml (which can also be used to deploy any flows you like when NiFi is deployed) …..
...and passing arguments (like AMS url) via bootstrap.conf. Advantage of doing it this way: if the collector is ever moved to a different host in the cluster, Ambari will let NiFi know (next time NiFi is restarted after the move)
As a result of the metrics integration, users get a dashboard for NiFi metrics in Ambari, such as:
Flowfiles sent/received
MBs read/written
JVM usage/thread counts
Dashboard widgets can:
be drilled into to see results from last 1,2,4 hours, day, week etc
export metrics data to csv or JSON
These same metrics can be viewed in Grafana dashboard:
Grafana can be accessed via quick link under ‘Ambari metrics’ service in Ambari
Pre-configured dashboards are available for each service but users can easily create custom dashboards for each component too
Ease of Security Setup
NiFi Identity mappings
These are used to map identities in DN pattern format (e.g. CN=Tom, OU=NiFi) into common identify strings (e.g. Tom@NiFi)
The patterns can be configured via ‘Advanced NiFi-properties’ section of Ambari configs. Sample values are provided via helptext
ActiveDirectory/LDAP integration
To enable users to login to NiFi using AD/LDAP credentials the ‘Advanced NiFi-login-identity-providers.xml’ section can be used to setup an ldap-provider for NiFi. Commented out sample xml fields are provided for the relevant settings e.g.
AD/LDAP url, search base, search filter, manager credentials
SSL for NiFi
Detailed steps for enabling SSL/identity mappings for Nifi available here
Options for SSL for NiFi:
1. Use NiFi CA to generate self-signed certificates
good for quick start/demos
2. Use your existing certificates
Usually done for production envs
SSL related configs are combined together in ‘Advanced NiFi-ambari-ssl-config’ config panel
Checkbox for whether SSL is enabled
NiFi CA fields - to configure certificate to be generated:
NiFi CA token(required)
NiFi CA DN prefix/suffix
NiFi CA Cert duration
NiFi CA host port
Checkbox for ‘NiFi CA Force Regenerate’
Keystore/truststore related fields - location/type of certificates:
Paths
Passwords
Types
Node identity fields:
Initial Admin Identity: long form of identity of Nifi admin user
Node Identities: long form of identities of nodes running Nifi
SSL Option 1 - using NiFi CA to generate new certificates through Ambari:
Just check “Enable SSL?” box and make sure CA token is set
Optionally update below as needed:
NiFi CA DN prefix/suffix
NiFi CA Cert duration
NiFi CA port
Check ‘NiFi CA Force Regenerate’ box
For changing certs after SSL already enabled
You can force regeneration of the certificates by either:
checking “NiFi CA Force Regenerate” checkbox
Or changing the passwords
You can also manually use tls-toolkit in standalone mode to generate new certificates outside of Ambari
SSL Option 2 - using your existing certificates:
Manually copy certificates to nodes
Populate keystore/truststore path/password/type fields
For keystore/trust paths that contain FQDN that need resolving:
use {NiFi_node_ssl_host} (This is useful for certs generated by NiFi-toolkit as they have the host’s FQDN in their name/path)
In both cases while enabling SSL, you will also need to populate the identity fields. This is to be able to login to NiFi after enabling SSL (assuming Ranger authorizer will not be used)
When setting these, first make sure that on all the nodes, authorizations.xml do not contain any policies. If it does, delete authorizations.xml from all nodes running NiFi. Otherwise, the identity related changes would not take effect.
On initial install there will not be any policies, but they will get created the first time the Identity fields are updated and NiFi restarted (i.e. if you entered incorrect values the first time, you will need to delete policies before re-entering the values)
Then save config changes and restart NiFi from Ambari to enable SSL
If NiFi CA option was used, this is the point at which certificates will get generated
Ranger integration with NiFi
Before installing Ranger there are some manual prerequisite steps:
Setup RDBMs to store Ranger policies
Install/setup Solr to store audits. In test/development environments, Ranger can re-use the Solr that comes with Logsearch/Ambari Infra services
Detailed steps for integrating Nifi with Ranger here
During Ranger install…
The backend RDBMS details are provided first via ‘Ranger Admin’ tab
The NiFi Ranger plugin can be enabled manage NiFi authorization policies in Ranger via ‘Ranger Plugin’ tab
Users/Groups can be synced from Active Directory/LDAP via ‘Ranger User Info’ tab
Ranger audits can be configured via ‘Ranger audit’ tab
After enabling Ranger and restarting NiFi, new config tabs appear under NiFi configs. NiFi/Ranger related configs can be accessed/updated here:
Ranger can be configured to communicate and retrieve resources from NiFi using a keystore (that has been imported into NiFi’s truststore)
Using a NiFi REST Client, Ranger is able to retrieve NiFi’s API endpoint information that can be secured via authorization
This list of resources are made available as auto-complete options when users are attempting to configure policies in Ranger
To communicate with NiFi over SSL a keystore and truststore should be available (with Ranger’s certificate imported into NiFi node truststores) for identification. The Owner for Certificate should be populated as well.
Once Ranger is identified NiFi will authorize Ranger to perform its resource lookup
Ranger policies can be created for NiFi (either via Ranger UI or API)
Create users in Ranger for NiFi users (either from certificate DNs, or import using AD/LDAP sync)
Decide which user has what type of access on what identifier
Default policy automatically created on first setup
Policy updates will be picked by Nifi after 30 seconds (by default)
Recommended approach:
Grant user access to modify the NiFi flow with a policy for /process-groups/<root-group-id> with RW
separate a policy for /provenance/process-groups/<root-group-id> (with each of the cluster node DNs) for read access
Ranger now track audits for NiFi (stored in standalone Solr or logsearch Solr)
For example: What user attempted what kind of NiFi access from what IP at what time?
Ranger also audits user actions related to NiFi in Ranger
For example: Which user created/updated NiFi policy at what time?
Kerberos for NiFi
HDF cluster with NiFi can be kerberized via standard Ambari security wizard (via MIT KDC or AD)
Also supported: NiFi installation on already kerberized HDF cluster
Detailed steps for enabling kerberos for HDF available here
Wizard will allow configuration of principal name and keytab path
NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari
During security wizard, NiFi.properties will automatically be updated:
NiFi.kerberos.service.principal
NiFi.kerberos.keytab.location
NiFi.kerberos.krb5.file
NiFi.kerberos.authentication.expiration
After enabling kerberos, login provider will also be switched to kerberos under the covers
Allows users to login via KDC credentials instead of importing certificates into the browser
Writing audits to kerberized Solr supported
After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos)
Try it out yourself!
Installation using official documentation: link
Automation to deploy clusters using Ambari blueprints: link
Enable SSL/Identity mappings for Nifi via Ambari: link
Enable Ranger authorization for Nifi: link
Enable Kerberos for HDF via Ambari: link
... View more
10-20-2017
05:31 PM
At Step 2, (with new ambari-bootstrap.sh) We need to add an additional line to the blueprint steps, export install_ambari_agent=false
... View more
11-11-2016
10:04 PM
Awesome! This worked for me. The timing could have not been better. I was working on setting up Zeppelin with OpenLDAP and livy today (HDP 2.5) and this was one of the issue I had to solve. Thank you!
... View more
02-13-2017
02:36 PM
Good point. For my Sandbox testing, I decided to just use the steps provided in http://stackoverflow.com/questions/40550011/zeppelin-how-to-restart-sparkcontext-in-zeppelin to stop the SparkContext when I need to do something outside of Zeppelin. Not ideal, but working good enough for some multi-framework prototyping I'm doing.
... View more
07-27-2016
07:24 PM
1 Kudo
@bigdata.neophyte: We have a single node HDP 2.3 VM where kerberos, Ranger, Ranger KMS enabled available for download here This was done as part of security workshop/webinar we did: https://github.com/abajwa-hw/security-workshops#current-release
... View more
01-13-2017
05:57 PM
I've had zero success trying to get NIFI installed via Ambari. When it throws the recommended settings screen, it won't let me proceed. Additionally, I tried the above steps to rm the files, I consistently get cannot remove `/var/lib/ambari-server/resources/stacks/HDP/2.5/services/NIFI/package/scripts': Invalid argument I seem to get the invalid argument "feature" whenever I try and remove stuff from this docker package. Any guidance would be appreciated. Thanks,
... View more
06-10-2016
11:36 PM
Could you provide the github link or upload the template xml before we publish this? Also would be good to show what the tweet looks like before/after processing
... View more