Member since
02-08-2016
80
Posts
88
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3197 | 12-15-2018 08:40 PM | |
2665 | 03-29-2018 10:15 AM | |
1020 | 02-01-2018 08:08 AM | |
1856 | 01-24-2018 09:23 PM | |
939 | 11-05-2017 03:24 PM |
04-27-2023
04:55 AM
I'm receiving an array of the Json objects like [{"key1":"value1", "key2":"value2"},{...},{...}], all what I'm doing is using SplitJson with the following expression. If you have another structure of a Json your expression could be different, or you should use any transform(f.e. JoltTransform) at first.
... View more
07-01-2020
09:43 AM
Great article! I faced the following error while trying adding data to ldap (Step 13.) # ldapadd -x -W -D "cn=Manager,dc=example,dc=com" -f /root/ldap/base.ldif Enter LDAP Password: adding new entry "dc=example,dc=com" ldap_add: Invalid syntax (21) additional info: objectClass: value #1 invalid per syntax After some research, found that we need to add the cosine and nis LDAP schemas before running the preceding command. # ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/cosine.ldif # ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/nis.ldif # ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/inetorgperson.ldif
... View more
01-31-2018
11:41 PM
6 Kudos
In this article I will explain an easy way to automate some basic tasks in NiFi as an introduction to NiPyApi, an automation package for Apache NiFi and its sub-projects. Environment Setup Requirements You will need a Python environment
2.7 or 3.6 are tested, and most computers come with one of these or you can create a virtualenv or you can install Python on OSX using homebrew You will need a recent internet browser, given you're reading this I assume you have one - I'm using Chrome You will need NiFi services to test against - if you have Docker installed you can use the following commands to prepare NiFi & NiFi Registry services: user$: curl https://raw.githubusercontent.com/Chaffelson/nipyapi/master/test_env_config/docker_compose_latest/docker-compose.yml | docker-compose -f - up -d
user$: docker ps Docker will download and start the NiFi containers, and show you the details: You should be able to browse to both NiFi and NiFi-Registry at the following URLs: http://localhost:8080/nifi/ http://localhost:18080/nifi-registry/ Installing NiPyApi Installing NiPyApi is very easy, and done by the usual Python package distribution manager called Pip: user$: pip install nipyapi NiPyApi will install along with it's package dependencies, much like a linux package - don't worry about the dependencies, it'll look like this when it's done: Once that completes, go ahead and start an interactive python session on your command line and run a test command: user$: python
>>> from nipyapi import config, canvas
>>> config.nifi_config.host
'http://localhost:8080/nifi-api'
>>> canvas.get_root_pg_id()
'4e8d8f99-0161-1000-fa6f-724e5873aebc' NiPyApi will look for a NiFi environment on the usual port, or you can change this in nipyapi.config shown above. Congratulations! You have just commanded the NiFi API in less than 5 lines of code. Investigating the Package Now we can try using a few of the NiPyApi commands to interact with the NiFi environment - while the entire NiFi and NiFi-Registry APIs are implemented, only some of the calls are surfaced for common use - you can find out about them in great detail either through the online documentation at ReadTheDocs, or by investigating the Github Repo. For now, try looking at the console documentation of the nipyapi.canvas functions using the help() command: >>> help(canvas)
Help on module nipyapi.canvas in nipyapi:
NAME
nipyapi.canvas
FILE
/Users/dchaffey/.virtualenvs/tmp-167d86bd91b19b09/lib/python2.7/site-packages/nipyapi/canvas.py
DESCRIPTION
For interactions with the NiFi Canvas
STATUS: Work in Progress to determine pythonic datamodel
FUNCTIONS
create_process_group(parent_pg, new_pg_name, location)
Creates a new PG with a given name under the provided parent PG
:param parent_pg: ProcessGroupEntity object of the parent PG
:param new_pg_name: String to name the new PG
:param location: Tuple of (x,y) coordinates to place the new PG
:return: ProcessGroupEntity of the new PG
... You can see there are a lot of functions here that you can use to complete tasks against NiFi, and there are even more in the nipyapi.templates and nipyapi.versioning modules. Trying an Automation Script There is a handy interactive Demo built into NiPyApi, and this time we're also going to use the new NiFi-Registry as well. It will procedurally generate a Process Group containing a Processor in NiFi, and then put them under Version Control in the NiFi Registry. It will then also clone the version from one Registry Bucket to another, simulating code promotion: Note that if you did not use the supplied Docker configuration above, you may have to modify the script to connect to your NiFi and NiFi-Registry environments. >>> from nipyapi.demo.console import *
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'bucket_0', 'bucket_1', 'canvas', 'config', 'process_group_0', 'processor_0', 'reg_client_0', 'ver_flow_0', 'ver_flow_1', 'ver_flow_info_0', 'ver_flow_snapshot_0', 'ver_flow_snapshot_1']
You can see here a number of NiFi and Registry objects have been created for you by the automation script as described. You can take a look at the script and how it's using the NiPyApi functions on Github. If you head over to your NiFi and NiFi-Registry GUI, you can explore the objects and try the new features out for yourself. Happy Coding!
... View more
Labels:
01-24-2018
09:29 PM
This error often comes up when the file being uploaded is not a valid template. Open the file with a plain text editor and check that it looks something like this: <?xml version="1.0" ?>
<template encoding-version="1.1">
<description></description>
<groupId>4d5dcf9a-015e-1000-097e-e505ed0f7fd2</groupId>
<name>nipyapi_testTemplate_00</name>
... View more
04-13-2018
06:35 PM
I am also getting this error. I have check the config in ambari and the db port and http ports are both set. It seems almost as if when ambari runs superset the config file in /etc/superset/conf/ isn't being used.
... View more
07-07-2019
04:16 AM
Got the same error. Your hint works. 2019-07-07 13:10:17,764 ERROR [main] org.apache.nifi.web.server.JettyServer Unable to load flow due to: org.apache.nifi.lifecycle.LifeCycleStartException: Failed to start Flow Service due to: java.net.SocketException: アドレスは既に使用中です (Listen failed)
org.apache.nifi.lifecycle.LifeCycleStartException: Failed to start Flow Service due to: java.net.SocketException: アドレスは既に使用中です (Listen failed)
at org.apache.nifi.controller.StandardFlowService.start(StandardFlowService.java:323)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1008)
at org.apache.nifi.NiFi.<init>(NiFi.java:158)
at org.apache.nifi.NiFi.<init>(NiFi.java:72)
at org.apache.nifi.NiFi.main(NiFi.java:297)
Caused by: java.net.SocketException: アドレスは既に使用中です (Listen failed)
at java.net.PlainSocketImpl.socketListen(Native Method)
at java.net.AbstractPlainSocketImpl.listen(AbstractPlainSocketImpl.java:399)
at java.net.ServerSocket.bind(ServerSocket.java:376)
at java.net.ServerSocket.<init>(ServerSocket.java:237)
at java.net.ServerSocket.<init>(ServerSocket.java:128)
at org.apache.nifi.io.socket.SocketUtils.createServerSocket(SocketUtils.java:108)
at org.apache.nifi.io.socket.SocketListener.start(SocketListener.java:85)
at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.start(SocketProtocolListener.java:97)
at org.apache.nifi.cluster.protocol.impl.NodeProtocolSenderListener.start(NodeProtocolSenderListener.java:64)
at org.apache.nifi.controller.StandardFlowService.start(StandardFlowService.java:314)
... 4 common frames omitted
2019-07-07 13:10:17,766 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
... View more
08-09-2019
02:03 PM
You can fully automate the process with third party tools, e.g. have a look at this blog post for converting XML Guidewire to a relational database, Hive, ORC or Parquet https://sonra.io/2019/08/08/masking-and-converting-guidewire-xml-to-oracle/
... View more
08-21-2018
01:20 PM
Can confirm the DBCPConnectionPool approach suggested here by @Rudolf Schimmel works. We did run into issues when using Java 10 (uncaught Exception: java.lang.NoClassDefFoundError: org/apache/thrift/TException even though libthrift was specified). Using Java 8 worked.
... View more
03-23-2017
04:05 PM
7 Kudos
HDF Version: 2.1.2 Test Environment: Single AWS VPC with multiple subnets, EC2 Security Groups with port rules controlling access between subnets, AWS LogFlow to track accepted/rejected traffic, 1 EC2 instance for Management, 3 EC2 instances for worker nodes, Laptop on public IP for remote access. Deployment / Testing Method:
Installed baseline of Ambari, Ambari Infra, Ambari Metrics, NiFi Certificate Authority, Zookeeper Cluster, NiFi Cluster, Clients Added Ranger Enabled SSL across all services Tested all commonly used interfaces, checked for rejected traffic Not Tested:
Using External LDAP/AD services Using external Certificate Authority Connecting to Auxiliary services Deployment Topology Zone M(anagement) D(ata) R(emote) P(ublic) Members Management Node Worker Nodes 1,2,3 Other NiFi Cluster Users Services Ambari Infra (Infra Solr), Ranger, Metrics Collector, Grafana, Metrics Monitor, Nifi Certificate Authority, Clients (Infra Solr, ZooKeeper) NiFi, Zookeeper, Metrics Monitor, Clients (Zookeper) NiFi Browser, ssh client Firewall Rules: Source Zone Dest Zone Port Notes M D 22 ssh, if used for deployment M D 8670 Ambari Agent M D 3000, 61300, 61310, 61330, 61320, 61388, 61288, 61181, 2181, 60200, 6188 Ambari Metrics Service M D 2181, 61181 ZooKeeper D M 8080 (http), 8443 (https) Ambari Interface D M 8440, 8441 Ambari Agents D M 6182, 6080, 8886 Ranger Services D M 3000, 61300, 61310, 61330, 61320, 61388, 61288, 61181, 2181, 60200, 6188 Ambari Metrics Service P M 8080 (http), 8443 (https) Ambari P M 6080 Ranger P M 3000 Grafana P M 8886 Solr Admin P M 22 ssh P D 22 ssh (optional) P D 9090 (http), 9091(https) NiFi Interface D R 9090 (http), 9091(https) NiFi Interface & Data Transfer Additional Port Considerations: Ports for all Zones to connect to LDAP/AD if used Ports for all Zones to send Logging and Alerts (smtp etc.) to other systems Ports for NiFi to connect to target systems, e.g. HDFS, Hive, Kafka, etc. You will require access to your CA to generate and move certificates; it is probably not necessary to open a Port for direct connection
... View more
Labels:
03-14-2017
02:09 PM
2 Kudos
Introduction What we’re doing: Committing to <2 hours setup time Deploying a matched pair of 3 node HDF clusters into two different zones on AWS EC2 Configuring some basic performance optimisations in EC2/NiFi Setting up site-to-site to work with EC2's public/private FQDNs What you will need: AWS account with access / credits to deploy 3x EC2 machines in two different regions Approx 80Gb SSD disk per machine, for 480GB total 3x Elastic IPs per region (Preferable) iTerm or similar ssh client with broadcast capability You should use two different AWS regions for your clusters, I’ll refer to them as regionA and regionB Caveats: We’re going to use a script to setup OS pre-requisites and deploy HDF on the server nodes to save time, as this article is specific to AWS EC2 setup and not generic HDF deployment We’re not setting up HDF Security, though we will restrict access to our nodes to specific IPs This is not best practice for all use cases, particularly on security; you are advised to take what you learn here and apply it intelligently to your own environment needs and use cases You don't have to use Elastic IPs, but they'll persist through environment reboots and therefore prevent FQDN changes causing your services to need reconfiguration Process Part 1: EC2 Public IPs, Instance & OS setup, and HDF packages deployment Create Public IPs
Login to your AWS:EC2 account and select regionA Select the ‘Elastic IPs’ interface Allocate 3x new addresses, note them down Switch to regionB and repeat Launch EC2 instances
Run the ‘Launch Instance’ wizard From ‘AWS Marketplace’, select ‘CentOS 7 (x86_64) - with Updates HVM’ Select ‘t2.xlarge’ (4x16) This is the minimum that will reasonably run the HDF stack, choose bigger if you prefer. Same for Node count and disk size below Set Number of Instances to ‘3’ Set root volume to 20Gb General Purpose SSD Add 3x New Volumes of the same configuration Set the ‘Name’ tag to something meaningful Create a new security group called ‘HDF Performance Test’
Set a rule to allow ‘All traffic’ to ‘My IP’ Add 3x new rules to allow ‘All Traffic’ to the Elastic IPs in regionB (the other region) that you created earlier Add the local subnet for internode communication Optional: You could create rules for the specific ports required, at the cost of a much longer configuration. Ports 22 (ssh), 8080 (Ambari), 9090-9092 (NiFi) should be sufficient for a no-security install Review your configuration options, and hit Launch Select your ssh key preference; either use an existing key or create a new one and download it Once the launcher completes, go to Elastic IPs Associate each Instance with an Elastic IP Note down the Private and Public DNS (FQDN) for each instance; the Public should have similar values to the Elastic IPs you allocatedWhile the deployment finishes, repeat Steps 1- 13 in the other region to create your matching sets of EC2 instances OS Setup and package installs
(optional) Launch iTerm, and open a broadcast tab for every node
Otherwise issue commands to each node sequentially (much slower)
yum update -y
mkfs -t ext4 /dev/xvdb
mkfs -t ext4 /dev/xvdc
mkfs -t ext4 /dev/xvdd
mkdir /mnt/nifi_content
mkdir /mnt/nifi_flowfile
mkdir /mnt/nifi_prov
mount /dev/xvdb /mnt/nifi_content/
mount /dev/xvdc /mnt/nifi_flowfile/
mount /dev/xvdd /mnt/nifi_prov/
echo "/dev/xvdb /mnt/nifi_content ext4 errors=remount-ro 0 1" >> /etc/fstab
echo "/dev/xvdc /mnt/nifi_flowfile ext4 errors=remount-ro 0 1" >> /etc/fstab
echo "/dev/xvdd /mnt/nifi_prov ext4 errors=remount-ro 0 1" >> /etc/fstab
We’re going to use a script to do a default install of Ambari for HDF, as we’re interested in looking at NiFi rather than overall HDF setup
https://community.hortonworks.com/articles/56849/automate-deployment-of-hdf-20-clusters-using-ambar.html Follow steps 1 – 3 in this guide, reproduced here for convenience Tip: using the command ‘curl icanhazptr.com’ will provide you the FQDN of the current ssh session for convenience Run these commands as root on the first node, which I assume will be running Ambari server export hdf_ambari_mpack_url=http://public-repo-1.hortonworks.com/HDF/centos7/2.x/updates/2.1.2.0/tars/hdf_ambari_mp/hdf-ambari-mpack-2.1.2.0-10.tar.gz
yum install -y git python-argparse
git clone https://github.com/seanorama/ambari-bootstrap.git
export install_ambari_server=true
~/ambari-bootstrap/ambari-bootstrap.sh
ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --purge --verbose #enter 'yes' to purge at prompt
ambari-server restart
Assuming Ambari will run on the first node in the cluster, run these commands as root on every other node export ambari_server= <FQDN of host where ambari-server will be installed>;
export install_ambari_server=false
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; Deploy and Configure multi-node HDF Clusters on AWS EC2 HDF Cluster Deployment Open a browser to port 8080 of the public FQDN of the Ambari Server Login using the defaults credentials of admin/admin Select ‘Launch Install Wizard’ Name your cluster Accept the default versions and repos Fill in these details:
Provide the list of Private FQDNs in the ‘Target Hosts’ panel Select ‘Perform manual registration on hosts’ and accept the warning Wait while hosts are confirmed, then hit Next
If this step fails, check you provided the Private FQDNs, and not the Public FQDNs Select the following services: ZooKeeper, Ambari Infra, Ambari Metrics, NiFi Service layout
Accept the default Master Assignment Use the ‘+’ key next to the NiFi row to add NiFi instances until you have one on each Node Unselect the Nifi Certificate Service and continue Customize Services
Provide a Grafana Admin Password in the ‘Ambari Metrics’ tab Provide Encryption Passphrases in the NiFi tab, they must be at least 12 characters When you hit Next you may get Configuration Warnings from Ambari; resolve any Errors and continue Hit Deploy and monitor the process Repeat steps 1 – 13 on the other cluster NiFi Service Configuration for multi-node cluster on AWS EC2 Login to the cluster In the NiFi service panel, go to the Configs tab Enter ‘repository’ in the top right filter box, and change the following
Nifi content repository default dir = /mnt/nifi_content Nifi flowfile repository dir = /mnt/nifi_flowfile Nifi provenance repository default dir = /mnt/nifi_prov Enter ‘mem’ in the filter box:
Set Initial memory allocation = 2048m Set Max memory allocation = 8096m Enter ‘port’ in the filter box:
Note down the NiFi HTTP port (Non-SSL), default is 9090 Set nifi.remote.input.socket.port = 9092 Save your changes Enter ‘nifi.remote.input.host’ in the filter box:
Note that we must use specific config groups to work around EC2’s NAT configuration Set nifi.remote.input.host = <Public FQDN of first NiFi Node> Save this value Click the ‘+’ icon next to this field to Override the field Select to create a new NiFi Configuration Group, name it host02 Set nifi.remote.input.host = <Public FQDN of the second NiFi node> Save this value Repeat for each NiFi node in the cluster When all values are set in config groups, go to the ‘Manage Config Groups’ link near the filter box Select each config group and use the plus key to assign a single host to it. The goal is that each host has a specific public FQDN assigned to this parameter Check your settings and restart the NiFi service You can watch the NiFi service startup and cluster voting process by using the command ‘tail –f /var/log/nifi/nifi-app.log’ on an ssh session on one of the hosts. NiFi is up when the jetty server reports the URLs it is listening on in the log, by default this is http://<public fqdn>:9090/nifi Summary In this article we have deployed sets of AWS EC2 instances for HDF clusters, prepared and deployed the necessary packages, and set the necessary configuration parameters to allow NiFi SiteToSite to operate behind the AWS EC2 NAT implementation. In the next article I will outline how to build the Dataflow for generating a small files performance test, and pushing that data efficiently via SiteToSite.
... View more
Labels: