About stevenmatison

stevenmatison · ‎11-12-2019

I recently found NiFi 1.10.0 was released: https://nifi.apache.org/download.html Does anyone have an ETA on a public repo and management pack? I started on a management pack to get 1.10.0, but after I get into the HDP/HDF stack it's still using public repos reporting only 1.9.x. I am going to start working on my own repos, but wanted to see if anyone was already working on the next public repo and mpack?

stevenmatison · ‎11-07-2019

I believe it is possible. You will need to create 3 separate installs of NiFI on same node, each with different ports. You will also need to do some custom configuration to make sure they are communicating together as a cluster. This would be very advanced. Also, putting on same node is not recommended. I would Highly Recommend you to build a 3 node cluster using Ambari, and install NiFi in this manner as it will make things very easy.

stevenmatison · ‎11-06-2019

Can you provide information about spec and configuration of NiFi? Number Nodes, memory settings, disk size, are partitions configured separately?, etc..

stevenmatison · ‎11-04-2019

Very good question here. Let me share some of my thoughts as I have installed ambari both from source and from Hortonworks Repos. Before I get started you should know that Hortonworks was a major contributor to Ambari Project, as such their documentation is very detailed for how to install Ambari and its components. In my opinion this is the preferred documentation. Hortonwork repos are THE public repos for ambari. Using them is much easier than building from source. The Ambari Project page at ambari.apache.org is just the project page. The documentation is specifically for ambari, and not necessarily for "hadoop" and does not include all the screen shots and deeper info you will find in the HortonWorks/Cloudera documentation for the same. Although the Project Page does not go into much detail, it does have the required artifacts, and enough information to setup nodes and get into the Cluster Install Wizard. For those organizations which are required to use private repos or to build their own, the Ambari Project page is very important.

stevenmatison · ‎11-04-2019

Here is working processor: Your value would be $.busId, $.speed, $.location Nested values are: $.location.lat $.location.long Also make sure the sample json is "location" (" missing in your sample above).

stevenmatison · ‎10-02-2019

MergeContent is what you are looking for: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html

stevenmatison · ‎10-02-2019

It is absolutely possible to do this. However somethings need to be considered: With a multi node Nifi cluster, the local storage must be a single location usually the primary node. This data will not be local to the rest of the cluster nodes. The location should be separate from OS partition and the required nifi repository partitions. This is to avoid corrupting these partitions in the chance local storage consumes all available space. In past projects I have used primary node, with a separate partition to storing files local to NiFi Primary Node. These files are then used outside of NiFi for other purposes. In some projects these files are picked up in NiFi in separate flows, and then re-distributed into the cluster for processing across all nodes. The primary use case here was audit received files directly to disk by Team 1. Some time later Team 2 access files for processing. In this sample Team 1 and Team 2 are completely separate with Security Group based access to nifi (they cannot see each others flows).

stevenmatison · ‎09-20-2019

To complete what you describe I would create an external script/process, or maybe even a NiFi flow, that will read the NiFi logs on all NiFi Nodes, and trigger a notification for the ERROR events. This process would be decoupled from the original data flow and the actual processor, but it will provide the visibility you need to monitor ERROR in the logs.

stevenmatison · ‎06-11-2019

Check out the following processor? CaptureChangeMySQL Retrieves Change Data Capture (CDC) events from a MySQL database. CDC Events include INSERT, UPDATE, DELETE operations. Events are output as individual flow files ordered by the time at which the operation occurred.

stevenmatison · ‎02-23-2019

This is a Work In Progress Article. Downloads: HDP/HDF 3 Ambari 2.7 Mpack ELK 6.3.2 with sudo (for non root user install) elasticsearch_mpack-3.0.0.0-1.tar.gz HDP/HDF 3 Ambari 2.7 Mpack ELK 6.3.2 without sudo elasticsearch_mpack-3.0.0.0-0.tar.gz In the Parent Article I introduce the process to take an existing HortonWorks ELK Mpack and take it through a series of versions (up to 2.6) which would allow me to install the ELK Version and Components I want (ElasticSearch, Logstash, Kibana, FileBeats, MetricBeats) in Ambari 2.6. In this article I am going to install the last articles ELK 2.6 version into my local machine using Vagrant. With this working Test Base I will version the Mpack up to 3.0 as I change the files to allow install into HDP & HDF 3.0. I am also going to change to the current version of ELK 6.6.1. Starting with the 2.6 Mpack and the Ambari Quick Start Guide I am able to get a cluster installed very easily in my local machine. In my Test Base install I chose a single node c7401.ambari.apache.org. For the purpose of this Test Base I only want to complete the most minimal install to support the Mpack Services without any issues during the Install Wizard. On this single node I install Ambari Server and Agent and the following components: Zookeeper Ambari Metrics ElasticSearch LogStash Kibana FileBeats MetricBeats Terminal Commands Required In Local Machine git clone https://github.com/u39kun/ambari-vagrant.git sudo -s 'cat ambari-vagrant/append-to-etc-hosts.txt >> /etc/hosts' cd ambari-vagrant/centos7.4 cp ../insecure_private_key . cp ~/Downloads/elasticsearch_mpack-2.6.0.0-9.tar.gz . vagrant up c7401 vagrant ssh c7401 Terminal Commands Required in Vagrant Node wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.5.1.0/ambari.repo yum --enablerepo=extras install epel-release -y yum install java java-devel ambari-server ambari-agent -y ambari-server setup -s ambari-server install-mpack --mpack=/vagrant/elasticsearch_mpack-2.6.0.0-9.tar.gz --verbose ambari-server start ambari-agent start After install there were some issues with starting the services. This is okay for now, the ELK Mpack really needs 4 nodes. Most of the work for versioning will be making sure the new Stack Versions are included. I can test all of these changes without expecting any services to work. With the Test Base complete I can now quickly spin up fresh clusters working my way from HDP 2.6 to HDP 3.1.0 and ELK 6.3.2 to 6.6.1. The ELK versions will likely require some additional configuration changes, so this version will be completed in a final test base that will include HDP 3.1.0, 4 nodes, and expect the services to start. Results From Using This Test Base HDP It took me a few sessions working with this base to figure out that the out of box install issues for HDP 3 were related to just a few conflicts in the original Mpack parameters. Conflict with config settings for user management. The following python command was necessary: python /var/lib/ambari-server/resources/scripts/configs.py -u admin -p admin -n DFHZ_ELK -l c7401.ambari.apache.org -t 8080 -a set -c cluster-env -k ignore_groupsusers_create -v true Adjustments to Mpack services params.py to get hostname and java_home from slightly different paths in the config object. hostname = config['agentLevelParams']['hostname'] java64_home = config['ambariLevelParams']['java_home'] With a working install for this new Mpack in HDP 3 I can now start working on a 2 node cluster to make sure nothing else is required to get the original 4 node ELK stack installed on HDP 3. After the 2 node install cluster wizard is complete I did have to manually start some ELK services. HDF 2 Node Test Base Next I need to create an HDF cluster and make sure this Mpack works there. This requires a fully new test base and some changes to the mpack.json file to include HDF stack_version. Terminal Command Required in Vagrant Node for Ambari Master wget http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.3.1.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.3.1.0-10.tar.gz && wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo && yum --enablerepo=extras install epel-release -y && yum install nano java java-devel ambari-server ambari-agent -y && ambari-server setup -s && ambari-server install-mpack --mpack=/root/hdf-ambari-mpack-3.3.1.0-10.tar.gz --verbose && ambari-server install-mpack --mpack=/vagrant/elasticsearch_mpack-3.0.0.0-0.tar.gz --verbose && ambari-server start && ambari-agent start Terminal Command Required in Vagrant Node Ambari Agent wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.3.0/ambari.repo && yum --enablerepo=extras install epel-release -y && yum install nano java java-devel ambari-agent -y && ambari-agent start Results From Using Test Base HDF It took me quite a few attempts to identify a solution to very many symptoms. Somewhere in the config object for HDF are some non UTF-8 characters. In some places this threw and error, in other places it ended up silently creating empty files across the elk stack during install. I added these lines to the python scripts: # encoding=utf8 import sys reload(sys) sys.setdefaultencoding('utf8') Once I identified the solution for the component python scripts I was able to get the stack installed without errors and running in HDF with only some starting issues. After starting components manually, Logstash & Kibana reporting as stopped but they are actually running. During my next sitting I focused on the stopped services. I set the Ambari-agent log level to DEBUG and noticed some additional terminal output in the command status output from "sudo service start". After changing Kibana and Logstash to just service start, I was able to manually stop the services in the node, then start from Ambari. I am not 100% sure this sudo was related. At any rate the ELK Mpack is now installed and all services running in HDF: I am going to go through a few more complete tests to make sure I can get the cluster stable right after install without any additional work. I completed my final test with "sudo service" replaced with "service". During Cluster Install Wizard everything installed without errors. The only issue was a warning for Check ElasticSearch which happens faster than ElasticSearch is started up. I came back to Ambari w/ Elasticsearch, Kibana, and Logstash running. I just had to manually start FileBeats and Metricbeats. Now I will be able to focus on last part of this article: upgrade ELK 6.3.1 to 6.6.1.

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

NiFi 1.10.0 Repository and Management Pack

Re: Nifi Cluster

Re: Slow FTP downloads with NiFi

Re: Difference between installing from Apache vs H...

Re: how can i use Evaluate Json Path for nested j...

Re: Execute processor only once for multiple flowf...

Re: Can we use apache NIFI as storage ? means pers...

Re: How capture the Nifi processor error messages.

Re: NiFi custom processor to read database view on...

Improve ELK Mpack for HDP & HDF 3 for Ambari 2.7