Member since
02-08-2016
80
Posts
89
Kudos Received
13
Solutions
01-31-2018
11:41 PM
6 Kudos
In this article I will explain an easy way to automate some basic tasks in NiFi as an introduction to NiPyApi, an automation package for Apache NiFi and its sub-projects. Environment Setup Requirements You will need a Python environment
2.7 or 3.6 are tested, and most computers come with one of these or you can create a virtualenv or you can install Python on OSX using homebrew You will need a recent internet browser, given you're reading this I assume you have one - I'm using Chrome You will need NiFi services to test against - if you have Docker installed you can use the following commands to prepare NiFi & NiFi Registry services: user$: curl https://raw.githubusercontent.com/Chaffelson/nipyapi/master/test_env_config/docker_compose_latest/docker-compose.yml | docker-compose -f - up -d
user$: docker ps Docker will download and start the NiFi containers, and show you the details: You should be able to browse to both NiFi and NiFi-Registry at the following URLs: http://localhost:8080/nifi/ http://localhost:18080/nifi-registry/ Installing NiPyApi Installing NiPyApi is very easy, and done by the usual Python package distribution manager called Pip: user$: pip install nipyapi NiPyApi will install along with it's package dependencies, much like a linux package - don't worry about the dependencies, it'll look like this when it's done: Once that completes, go ahead and start an interactive python session on your command line and run a test command: user$: python
>>> from nipyapi import config, canvas
>>> config.nifi_config.host
'http://localhost:8080/nifi-api'
>>> canvas.get_root_pg_id()
'4e8d8f99-0161-1000-fa6f-724e5873aebc' NiPyApi will look for a NiFi environment on the usual port, or you can change this in nipyapi.config shown above. Congratulations! You have just commanded the NiFi API in less than 5 lines of code. Investigating the Package Now we can try using a few of the NiPyApi commands to interact with the NiFi environment - while the entire NiFi and NiFi-Registry APIs are implemented, only some of the calls are surfaced for common use - you can find out about them in great detail either through the online documentation at ReadTheDocs, or by investigating the Github Repo. For now, try looking at the console documentation of the nipyapi.canvas functions using the help() command: >>> help(canvas)
Help on module nipyapi.canvas in nipyapi:
NAME
nipyapi.canvas
FILE
/Users/dchaffey/.virtualenvs/tmp-167d86bd91b19b09/lib/python2.7/site-packages/nipyapi/canvas.py
DESCRIPTION
For interactions with the NiFi Canvas
STATUS: Work in Progress to determine pythonic datamodel
FUNCTIONS
create_process_group(parent_pg, new_pg_name, location)
Creates a new PG with a given name under the provided parent PG
:param parent_pg: ProcessGroupEntity object of the parent PG
:param new_pg_name: String to name the new PG
:param location: Tuple of (x,y) coordinates to place the new PG
:return: ProcessGroupEntity of the new PG
... You can see there are a lot of functions here that you can use to complete tasks against NiFi, and there are even more in the nipyapi.templates and nipyapi.versioning modules. Trying an Automation Script There is a handy interactive Demo built into NiPyApi, and this time we're also going to use the new NiFi-Registry as well. It will procedurally generate a Process Group containing a Processor in NiFi, and then put them under Version Control in the NiFi Registry. It will then also clone the version from one Registry Bucket to another, simulating code promotion: Note that if you did not use the supplied Docker configuration above, you may have to modify the script to connect to your NiFi and NiFi-Registry environments. >>> from nipyapi.demo.console import *
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'bucket_0', 'bucket_1', 'canvas', 'config', 'process_group_0', 'processor_0', 'reg_client_0', 'ver_flow_0', 'ver_flow_1', 'ver_flow_info_0', 'ver_flow_snapshot_0', 'ver_flow_snapshot_1']
You can see here a number of NiFi and Registry objects have been created for you by the automation script as described. You can take a look at the script and how it's using the NiPyApi functions on Github. If you head over to your NiFi and NiFi-Registry GUI, you can explore the objects and try the new features out for yourself. Happy Coding!
... View more
Labels:
09-08-2017
09:57 AM
Hi @smanjee - did you set up any automation to push your nar files into the nifi docker container on build?
... View more
05-31-2017
10:20 AM
@James Dong @Hans Pointner
This stylesheet is incomplete for arbitrary XML conversion, I suggest you try https://github.com/bramstein/xsltjson
I've documented use of it with the TransformXML processor in a new article here: https://community.hortonworks.com/content/kbentry/105547/nifi-xml-to-json-shredding-a-generalised-solution-3.html
... View more
05-31-2017
09:52 AM
9 Kudos
I'm going to cover a simple OSS solution to Shredding XML in NiFi, and demonstrate how you can chain simple steps together to achieve common data shredding tasks. Feel free to get in touch if you need to achieve something more complex than these basic steps will allow. We will be covering:
Procedurally converting Xml to Json using a fast XSLT 2.0 template Constructing Jolt transforms to extract nested subsections of JSON documents Constructing JsonPath expressions to split multi-record JSON documents Procedurally flattening complex nested JSON for easy querying This process is shown on NiFi-1.2.0, and tested on a variety of XML documents, but most notably a broad collection of GuideWire sample XMLs as part of a Client PoC. The XML examples below have retained the nested structure but anonymised the fields. XML to JSON Here we combine the NiFI TransformXML processor with the excellent BSD-licensed xsltjson procedural converter found at https://github.com/bramstein/xsltjson. Simply check out the repo and set the XSLT filename in the processor to xsltjson/conf/xml-to-json.xsl.
There are several conversion options present, I suggest the Badgerfish notation if you want an easier time of validating your conversion accuracy, but the default conversion is suitably compact from uncomplicated XMLs.
So your Input XML might look something like this: <BrokerResponse>
<aggsId>3423897f9w8v89yb99873r</aggsId>
<quote>
<brandID>AB</brandID>
<brandDescription>Corp</brandDescription>
<quoteReference>0023400010050105</quoteReference>
<annualPremium>271.45</annualPremium>
<totalPremiumPNCD>304.56</totalPremiumPNCD>
<pncdIndicator>true</pncdIndicator>
<productType>Private Car</productType>
<insurerName>SomeRandom Insurance Company Limited</insurerName>
<coverType>comprehensive</coverType>
<instalments>
<instalmentScheme>12 at 13.9% (qr:27)</instalmentScheme>
<instalmentType>Monthly</instalmentType>
<downPayment>29.18</downPayment>
<downPaymentPercentage>8.3385725</downPaymentPercentage>
<totalInstalmentPremium>349.94</totalInstalmentPremium>
<paymentAmount>29.16</paymentAmount>
<noOfPayments>11</noOfPayments>
<interestAmount>45.38</interestAmount>
<apr>42.8</apr>
</instalments>
<vehicle>
<excess>
<name>PCAccidentalDamageCov_Ext</name>
<amount>95.0</amount>
</excess>
... etc. And your output would look something like this (these strings aren't identical due to my data anonymization): {
"BrokerResponse" : {
"aggsId" : "4598e79g8798f298f",
"quote" : [ {
"brandID" : "AB",
"brandDescription" : "Corp",
"quoteReference" : "0000120404010",
"annualPremium" : 271.45,
"totalPremiumPNCD" : 304.56,
"pncdIndicator" : true,
"productType" : "Private Car",
"insurerName" : "SomeRandom Insurance Company Limited",
"coverType" : "comprehensive",
"instalments" : {
"instalmentScheme" : "12 at 12.3% (qr:33)",
"instalmentType" : "Monthly",
"downPayment" : 29.18,
"downPaymentPercentage" : 8.3385725,
"totalInstalmentPremium" : 349.94,
"paymentAmount" : 29.16,
"noOfPayments" : 11,
"interestAmount" : 45.38,
"apr" : 29.9
}, {
"brandID" : "BC",
"brandDescription" : "Acme Essential",
"quoteReference" : "NA",
"isDeclined" : true,
"quoteErrors" : {
"errorCode" : "QUOTE_DECLINED",
"errorDescription" : "Quote Declined"
}
}
}
]
}
} Using Jolt to extract sections Coming to both XSLT and Jolt as a new user, I found Jolt far easier to learn and use - Relying on the every popular StackExchange, Jolt answers tended to teach you to fish, whereas XSLT answers were usually selling you a fish. Handily, NiFi has a built in editor if you use the Advanced button on the JoltTransformJSON processor, this mimics the behaviour on the popular http://jolt-demo.appspot.com/ site for building your transforms.
A key thing to note is setting the Jolt DSL to 'Chain' in the NiFi processor, and then using your various 'spec' settings within the Transforms specified. This will align the NiFi processor behaviour with the Jolt-demo. Building a Jolt spec is about defining steps from the root of the document, and there are excellent guides elsewhere on the internet, but here is a simple but useful example.
Given the previous example of Xml converted to Json, this Jolt transform would check each quote subsection of the BrokerResponse, and if it contains an instalments section, return it in an array called quoteOffers, and drop any quotes that don't contain an Instalments section, such as the declined offers: [
{
"operation": "shift",
"spec": {
"BrokerResponse": {
"quote": {
"*": {
"instalments": {
"@1": "quoteOffers[]"
}
}
}
}
}
}
] This next Jolt transform would select just the Instalments section from the previous output of quoteOffers, and drop the rest of the details: [
{
"operation": "shift",
"spec": {
"quoteOffers": {
"*": {
"instalments": {
"@0": "instalments[]"
}
}
}
}
}
] Much simpler than XSLT! Using JsonPath to split documents This is a very simple process, again with good examples available out on the wider internet.
Using the above example again, if we received multiple quoteResponses in a single document we'd then have multiple instalment responses, and we might want to split them out into one quote per document, this would be as simple as using the following: $.instalments.* This specifies the root of the document using $, the instalments array, and then emitting each child item as a separate Flowfile. Flattening Json Something else you might want to do is Flatten your complex nested structures into simple iterables without having to specify a schema. This can be really useful if you just want to load the shredded XML for further analysis in Python without having traverse the structure to get at the bits you're interested in. I came an the excellent Apache licensed java lib at https://github.com/wnameless/json-flattener, which I have wrapped into a NiFi-1.2-0 compatible processor at https://github.com/Chaffelson/nifi-flatjson-bundle. There are many more options within the lib that I have not taken the time to expose yet, including making it reversible! Again using our example XML document from above, the flattened output might look a bit like this: {
"quoteOffers[0].brandID" : "AB",
"quoteOffers[0].brandDescription" : "Corp",
"quoteOffers[0].quoteReference" : "004050025001001",
"quoteOffers[0].annualPremium" : 271.45,
"quoteOffers[0].totalPremiumPNCD" : 304.56,
"quoteOffers[0].pncdIndicator" : true,
"quoteOffers[0].productType" : "Private Car",
"quoteOffers[0].insurerName" : "SomeRandom Insurance Company Limited",
"quoteOffers[0].coverType" : "comprehensive",
"quoteOffers[0].instalments.instalmentScheme" : "12 at 13.9% (qr:2)2",
"quoteOffers[0].instalments.instalmentType" : "Monthly",
"quoteOffers[0].instalments.downPayment" : 29.18,
"quoteOffers[0].instalments.downPaymentPercentage" : 8.3385725,
"quoteOffers[0].instalments.totalInstalmentPremium" : 349.94,
"quoteOffers[0].instalments.paymentAmount" : 29.16,
"quoteOffers[0].instalments.noOfPayments" : 11,
"quoteOffers[0].instalments.interestAmount" : 45.38,
"quoteOffers[0].instalments.apr" : 23.9,
"quoteOffers[0].vehicle.excess[0].name" : "PCAccidentalDamageCov_Ext",
"quoteOffers[0].vehicle.excess[0].amount" : 95.0,
"quoteOffers[0].vehicle.excess[1].name" : "PCLossFireTheftCov_Ext",
"quoteOffers[0].vehicle.excess[1].amount" : 95.0,
"quoteOffers[0].vehicle.excess[2].name" : "PCTheftKeysTransmitterCov_Ext",
"quoteOffers[0].vehicle.excess[2].amount" : 95.0,
"quoteOffers[0].vehicle.excess[3].name" : "PCGlassDmgWrepairdmgCT_Ext",
"quoteOffers[0].vehicle.excess[3].amount" : 25.0,
"quoteOffers[0].vehicle.excess[4].name" : "PCGlassDmgWreplacementdmgCT_Ext",
"quoteOffers[0].vehicle.excess[4].amount" : 85.0,
"quoteOffers[0].vehicle.excess[5].name" : "Voluntary Excess",
"quoteOffers[0].vehicle.excess[5].amount" : 100.0,
... etc. Conclusion So there you have it, with only 3 lines of code we've converted arbitrary nested XML into JSON, filtered out bits of the document we don't want (declined quotes), extracted the section of the quotes we want to process (quoteOffers), split each quote into a single document (Instalments), and then flattened the rest of the quoteResponse into a flat JSON document for further analysis. Feel free to contact me if you have a shredding challenge we might be able to help you with.
... View more
Labels:
05-15-2017
09:39 AM
Hi @Raghav Ramakrishann sorry I only just saw this comment as I've been away on Paternity leave.
Can you share the version of CDH you're connecting to, and your service parameters? I might be able to troubleshoot a bit.
... View more
04-12-2017
01:24 AM
6 Kudos
I recently did a PoC with a customer to integrate NiFi with CDH, part of this was creating external tables in Hive on the newly loaded data. In this article I will share the approaches, useful workarounds, how to customise your own NiFi build for backwards compatibility, and provide a pre-built CDH-compatible Hive Bundle for you to download and try. So first, why is this necessary? Well the short answer is that NiFi 1.x's minimum supported version of Hive is 1.2.x, but CDH uses a fork of Hive1.1.x, which introduces two common backwards compatibility challenges: The first is that it uses an older version of Thrift, so we need to configure NiFi to use this same version if we want to talk directly. The second is that new features introduced after version 1.1.0 aren't available in the CDH release, so we have to stop NiFi from looking for them. The obvious other option here is to work with CDH Hive indirectly, and thus we come to the workarounds. Workarounds: It is very common in PoCs to not have all the software and configuration parameters exactly as you would like them to be, and to have no time to wait for change control to allow installs and firewall modifications. One of the great things about NiFi is the flexibility to quickly work around roadblocks, so here's the list of workarounds investigated:
The WebHCat service provides a RESTApi to run Hive queries which we could've accessed using the NiFi HTTP processors; unfortunately the port was blocked at the firewall. The Beeline client could've been run via the NiFi Execute processors; however the NiFi server was outside the test CDH cluster and there was no available license for installing another gateway, nor time for the change control. Stream the Hive queries in a bash runner via an SSH tunnel into an existing edge node on the test CDH cluster using NiFi ExecuteStream processors; this works, but breaks various rules. Modify the NiFi-Hive processors to be Cloudera compatible, if not officially supported... A pre-built NiFi-Hive bundle for CDH 5.10.0: Note that I have only tested the Hive bundle functionality against CDH5.10.0, not any of the other processors such as HDFS or Kafka nor other versions. Neither I nor Hortonworks offer guarantees that this or other services will work against CDH and you should thoroughly test things before trusting them with important data. Here is a Hive-Bundle I've built for CDH5.10.0, just copy it into your nifi/lib directory and restart the service, you should be able to connect the PutHiveQL and SelectHiveQL to your Hive2 service. (dropbox link to file) How to create your own Cloudera-compatible NiFi Hive Bundle: The following instructions were tested on a Centos7 VM.
ssh <build server FQDN>
sudo su -
yum update -y
yum install -y wget
# Install Maven, Java1.8, Git, to meet minimum NiFi build requirements.
wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
yum install -y git java-1.8.0-openjdk apache-maven
logout
git clone https://github.com/Chaffelson/nifi.git
cd nifi
git checkout nifi-1.1.x-cdhHiveBundle
mvn -T C2.0 clean install -Pcloudera -Dhive.version=1.1.0-cdh5.10.0 -Dhive.hadoop.version=2.6.0-cdh5.10.0 -Dhadoop.version=2.6.0-cdh5.10.0 -DskipTests
nifi-assembly/target/nifi-1.1.1-SNAPSHOT-bin/nifi-1.1.1-SNAPSHOT/bin/nifi.sh start
# browse to http://<build server FQDN>:8080/nifi to test your new hive bundle I have created a branch of NiFi-1.1.x and modified it so the Hive Bundle is backwards compatible with CDH, and rolled in an updated fix or two for your convenience, here's a link to the diff You may need to change the listed CDH versions to match your environment, I suggest you use the CDH Maven Repository documentation pages
... View more
Labels:
03-23-2017
04:05 PM
7 Kudos
HDF Version: 2.1.2 Test Environment: Single AWS VPC with multiple subnets, EC2 Security Groups with port rules controlling access between subnets, AWS LogFlow to track accepted/rejected traffic, 1 EC2 instance for Management, 3 EC2 instances for worker nodes, Laptop on public IP for remote access. Deployment / Testing Method:
Installed baseline of Ambari, Ambari Infra, Ambari Metrics, NiFi Certificate Authority, Zookeeper Cluster, NiFi Cluster, Clients Added Ranger Enabled SSL across all services Tested all commonly used interfaces, checked for rejected traffic Not Tested:
Using External LDAP/AD services Using external Certificate Authority Connecting to Auxiliary services Deployment Topology Zone M(anagement) D(ata) R(emote) P(ublic) Members Management Node Worker Nodes 1,2,3 Other NiFi Cluster Users Services Ambari Infra (Infra Solr), Ranger, Metrics Collector, Grafana, Metrics Monitor, Nifi Certificate Authority, Clients (Infra Solr, ZooKeeper) NiFi, Zookeeper, Metrics Monitor, Clients (Zookeper) NiFi Browser, ssh client Firewall Rules: Source Zone Dest Zone Port Notes M D 22 ssh, if used for deployment M D 8670 Ambari Agent M D 3000, 61300, 61310, 61330, 61320, 61388, 61288, 61181, 2181, 60200, 6188 Ambari Metrics Service M D 2181, 61181 ZooKeeper D M 8080 (http), 8443 (https) Ambari Interface D M 8440, 8441 Ambari Agents D M 6182, 6080, 8886 Ranger Services D M 3000, 61300, 61310, 61330, 61320, 61388, 61288, 61181, 2181, 60200, 6188 Ambari Metrics Service P M 8080 (http), 8443 (https) Ambari P M 6080 Ranger P M 3000 Grafana P M 8886 Solr Admin P M 22 ssh P D 22 ssh (optional) P D 9090 (http), 9091(https) NiFi Interface D R 9090 (http), 9091(https) NiFi Interface & Data Transfer Additional Port Considerations: Ports for all Zones to connect to LDAP/AD if used Ports for all Zones to send Logging and Alerts (smtp etc.) to other systems Ports for NiFi to connect to target systems, e.g. HDFS, Hive, Kafka, etc. You will require access to your CA to generate and move certificates; it is probably not necessary to open a Port for direct connection
... View more
Labels:
03-14-2017
02:09 PM
2 Kudos
Introduction What we’re doing: Committing to <2 hours setup time Deploying a matched pair of 3 node HDF clusters into two different zones on AWS EC2 Configuring some basic performance optimisations in EC2/NiFi Setting up site-to-site to work with EC2's public/private FQDNs What you will need: AWS account with access / credits to deploy 3x EC2 machines in two different regions Approx 80Gb SSD disk per machine, for 480GB total 3x Elastic IPs per region (Preferable) iTerm or similar ssh client with broadcast capability You should use two different AWS regions for your clusters, I’ll refer to them as regionA and regionB Caveats: We’re going to use a script to setup OS pre-requisites and deploy HDF on the server nodes to save time, as this article is specific to AWS EC2 setup and not generic HDF deployment We’re not setting up HDF Security, though we will restrict access to our nodes to specific IPs This is not best practice for all use cases, particularly on security; you are advised to take what you learn here and apply it intelligently to your own environment needs and use cases You don't have to use Elastic IPs, but they'll persist through environment reboots and therefore prevent FQDN changes causing your services to need reconfiguration Process Part 1: EC2 Public IPs, Instance & OS setup, and HDF packages deployment Create Public IPs
Login to your AWS:EC2 account and select regionA Select the ‘Elastic IPs’ interface Allocate 3x new addresses, note them down Switch to regionB and repeat Launch EC2 instances
Run the ‘Launch Instance’ wizard From ‘AWS Marketplace’, select ‘CentOS 7 (x86_64) - with Updates HVM’ Select ‘t2.xlarge’ (4x16) This is the minimum that will reasonably run the HDF stack, choose bigger if you prefer. Same for Node count and disk size below Set Number of Instances to ‘3’ Set root volume to 20Gb General Purpose SSD Add 3x New Volumes of the same configuration Set the ‘Name’ tag to something meaningful Create a new security group called ‘HDF Performance Test’
Set a rule to allow ‘All traffic’ to ‘My IP’ Add 3x new rules to allow ‘All Traffic’ to the Elastic IPs in regionB (the other region) that you created earlier Add the local subnet for internode communication Optional: You could create rules for the specific ports required, at the cost of a much longer configuration. Ports 22 (ssh), 8080 (Ambari), 9090-9092 (NiFi) should be sufficient for a no-security install Review your configuration options, and hit Launch Select your ssh key preference; either use an existing key or create a new one and download it Once the launcher completes, go to Elastic IPs Associate each Instance with an Elastic IP Note down the Private and Public DNS (FQDN) for each instance; the Public should have similar values to the Elastic IPs you allocatedWhile the deployment finishes, repeat Steps 1- 13 in the other region to create your matching sets of EC2 instances OS Setup and package installs
(optional) Launch iTerm, and open a broadcast tab for every node
Otherwise issue commands to each node sequentially (much slower)
yum update -y
mkfs -t ext4 /dev/xvdb
mkfs -t ext4 /dev/xvdc
mkfs -t ext4 /dev/xvdd
mkdir /mnt/nifi_content
mkdir /mnt/nifi_flowfile
mkdir /mnt/nifi_prov
mount /dev/xvdb /mnt/nifi_content/
mount /dev/xvdc /mnt/nifi_flowfile/
mount /dev/xvdd /mnt/nifi_prov/
echo "/dev/xvdb /mnt/nifi_content ext4 errors=remount-ro 0 1" >> /etc/fstab
echo "/dev/xvdc /mnt/nifi_flowfile ext4 errors=remount-ro 0 1" >> /etc/fstab
echo "/dev/xvdd /mnt/nifi_prov ext4 errors=remount-ro 0 1" >> /etc/fstab
We’re going to use a script to do a default install of Ambari for HDF, as we’re interested in looking at NiFi rather than overall HDF setup
https://community.hortonworks.com/articles/56849/automate-deployment-of-hdf-20-clusters-using-ambar.html Follow steps 1 – 3 in this guide, reproduced here for convenience Tip: using the command ‘curl icanhazptr.com’ will provide you the FQDN of the current ssh session for convenience Run these commands as root on the first node, which I assume will be running Ambari server export hdf_ambari_mpack_url=http://public-repo-1.hortonworks.com/HDF/centos7/2.x/updates/2.1.2.0/tars/hdf_ambari_mp/hdf-ambari-mpack-2.1.2.0-10.tar.gz
yum install -y git python-argparse
git clone https://github.com/seanorama/ambari-bootstrap.git
export install_ambari_server=true
~/ambari-bootstrap/ambari-bootstrap.sh
ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --purge --verbose #enter 'yes' to purge at prompt
ambari-server restart
Assuming Ambari will run on the first node in the cluster, run these commands as root on every other node export ambari_server= <FQDN of host where ambari-server will be installed>;
export install_ambari_server=false
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; Deploy and Configure multi-node HDF Clusters on AWS EC2 HDF Cluster Deployment Open a browser to port 8080 of the public FQDN of the Ambari Server Login using the defaults credentials of admin/admin Select ‘Launch Install Wizard’ Name your cluster Accept the default versions and repos Fill in these details:
Provide the list of Private FQDNs in the ‘Target Hosts’ panel Select ‘Perform manual registration on hosts’ and accept the warning Wait while hosts are confirmed, then hit Next
If this step fails, check you provided the Private FQDNs, and not the Public FQDNs Select the following services: ZooKeeper, Ambari Infra, Ambari Metrics, NiFi Service layout
Accept the default Master Assignment Use the ‘+’ key next to the NiFi row to add NiFi instances until you have one on each Node Unselect the Nifi Certificate Service and continue Customize Services
Provide a Grafana Admin Password in the ‘Ambari Metrics’ tab Provide Encryption Passphrases in the NiFi tab, they must be at least 12 characters When you hit Next you may get Configuration Warnings from Ambari; resolve any Errors and continue Hit Deploy and monitor the process Repeat steps 1 – 13 on the other cluster NiFi Service Configuration for multi-node cluster on AWS EC2 Login to the cluster In the NiFi service panel, go to the Configs tab Enter ‘repository’ in the top right filter box, and change the following
Nifi content repository default dir = /mnt/nifi_content Nifi flowfile repository dir = /mnt/nifi_flowfile Nifi provenance repository default dir = /mnt/nifi_prov Enter ‘mem’ in the filter box:
Set Initial memory allocation = 2048m Set Max memory allocation = 8096m Enter ‘port’ in the filter box:
Note down the NiFi HTTP port (Non-SSL), default is 9090 Set nifi.remote.input.socket.port = 9092 Save your changes Enter ‘nifi.remote.input.host’ in the filter box:
Note that we must use specific config groups to work around EC2’s NAT configuration Set nifi.remote.input.host = <Public FQDN of first NiFi Node> Save this value Click the ‘+’ icon next to this field to Override the field Select to create a new NiFi Configuration Group, name it host02 Set nifi.remote.input.host = <Public FQDN of the second NiFi node> Save this value Repeat for each NiFi node in the cluster When all values are set in config groups, go to the ‘Manage Config Groups’ link near the filter box Select each config group and use the plus key to assign a single host to it. The goal is that each host has a specific public FQDN assigned to this parameter Check your settings and restart the NiFi service You can watch the NiFi service startup and cluster voting process by using the command ‘tail –f /var/log/nifi/nifi-app.log’ on an ssh session on one of the hosts. NiFi is up when the jetty server reports the URLs it is listening on in the log, by default this is http://<public fqdn>:9090/nifi Summary In this article we have deployed sets of AWS EC2 instances for HDF clusters, prepared and deployed the necessary packages, and set the necessary configuration parameters to allow NiFi SiteToSite to operate behind the AWS EC2 NAT implementation. In the next article I will outline how to build the Dataflow for generating a small files performance test, and pushing that data efficiently via SiteToSite.
... View more
Labels:
01-23-2017
10:41 AM
Doesn't work on RHEL7, can't pass to customers as an easy process to follow
... View more