Created on 11-22-2016 12:57 PM - edited 08-17-2019 07:53 AM
The goal of this article is to ingest log data from multiple servers (running MiNiFi) that push log data to a NiFi cluster. The NiFi cluster will listen for the log data on an input port and route to an HDFS directory (determined by the host name). This article will assume you are using Ambari for NiFi installation/administration.
Reason: Allows NiFi cluster to use an Input Port (MiNiFi will push to a Remote Processing Group) * NiFi will listen using the Input Port
a) In Ambari, go to NiFi
b) Choose the Configs tab
c) Choose the Advanced nifi-properties
d) Set the nifi.remote.input.host to your NiFi hostname and nifi.remote.input.socket.port to 10000
e) Restart NiFi using Ambari
Reason: Listen for incoming log data and route the log data to an HDFS directory
a) On the NiFi Flow Canvas, drag-and-drop an Input port (Name your Input Port – I named mine “listen_for_minifi_logs”)
b) Drag-and-drop processor RouteOnAttribute
c) Create a connection between Input Port and RouteOnAttribute
d) Configure RouteOnAttribute - Properties (removing the red caution), adding two properties (one property per server you’re installing MiNiFi on)
I have two servers – rcicak0.field.hortonworks.com and rcicak1.field.hortonworks.com, the incoming log data (flowfile) will contain an attribute called “host_name” where we’ll properly route the flowfile depending on the host_name property
e) Drag-and-drop three putHDFS processors & create a connection using hostname_rcicak0, hostname_rcicak1 and unmatched
f) Each putHDFS processor will have a different HDFS directory – configure properties for each putHDFS processor /tmp/rcicak0/, /tmp/rcicak1/ and /tmp/unmatched
g) Configure the HDFS directory – properties
(adding a core-site.xml and the directory – depending on the connection)
h) Play the processors – at this point, the NiFi flow is ready to receive Log data from MiNiFi
Reason: MiNiFi needs to push the log data to a remote processing group and delete the log file
a) Download MiNiFi (from http://hortonworks.com/downloads/ ) on each of the servers (that contain log data)
b) Unzip minifi-0.0.1-bin.zip to a directory
c) Complete step 4 below before continuing to d
d) Running with an account that has the read/write permission to the log data directory (to read the file and delete the file) run *location/minifi-0.0.1/bin/minifi.sh start
Reason: Push the log data to a remote processing group and delete the log file
a) Create a process group (call the group “minifi_flow”)
b) Go into the process group “minifi_flow"
c) Drag-and-drop the processor GetFile
d) Configure the processor GetFile – Properties (IMPORTANT: Any file matching the file filter’s regular expression under input directory [and the recursive subdirectories when set to true], the file will be deleted once the file is stored in MiNiFi’s content repository
In the example above, the file filter looks for hdfs-audit.log.Archive -> in this case is a date
e) Drag-and-drop the UpdateAttribute processor and create a successful connection between GetFile and UpdateAttribute
f) Configure the UpdateAttribute processor – Properties, setting the host_name attribute adding the nifi expression language getting the hostname
g) Drag-and-drop a remote process group – use the nifi.remote.input.host from above for the URL
Wait for the connection to establish before continuing to h
h) Add a connection between UpdateAttribute and the Remote Process Group – under To Input choose listen_for_minifi_logs
i) Select all processors and relationships to create a template (Download the template’s xml file)
j) Use the minifi-toolkit (https://www.apache.org/dyn/closer.lua?path=/nifi/minifi/0.0.1/minifi-toolkit-0.0.1-bin.zip ) and run the config.sh tool “config.sh transform theminifi_flow_template.xml config.yml” -> which will convert the XML to a YML file that will be read by MiNiFi
k) Copy the config.yml file into the minifi-0.0.1/conf directory on each of the MiNiFi servers (if you already have your MiNiFi agent started, restart the agent)
Created on 04-12-2017 12:51 PM
I followed procedure, but i am getting the handshake Exception and I made false "nifi.remote.input.secure" still didnt help. Please help me did I miss anything.
2017-04-12 06:13:50,696 INFO [StandardProcessScheduler Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=616d3e3e-015b-1000-0000-000000000000] to run with 1 threads
2017-04-12 06:13:51,006 ERROR [Timer-Driven Process Thread-1] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:13:51,008 ERROR [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:13:51,020 INFO [NiFi Site-to-Site Connection Pool Maintenance] o.apache.nifi.remote.client.PeerSelector org.apache.nifi.remote.client.PeerSelector@1656d5bc Successfully refreshed Peer Status; remote instance consists of 1 peers 2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
2017-04-12 06:14:21,049 ERROR [Timer-Driven Process Thread-4] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode
Thanks
Chaitanya