Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar

The goal of this article is to ingest log data from multiple servers (running MiNiFi) that push log data to a NiFi cluster. The NiFi cluster will listen for the log data on an input port and route to an HDFS directory (determined by the host name). This article will assume you are using Ambari for NiFi installation/administration.

1) Set the nifi.remote.input.host and nifi.remote.input.socket.port of NiFi cluster

Reason: Allows NiFi cluster to use an Input Port (MiNiFi will push to a Remote Processing Group) * NiFi will listen using the Input Port

a) In Ambari, go to NiFi

9700-1-a.png

b) Choose the Configs tab

9711-1-b.png

c) Choose the Advanced nifi-properties

9712-1-c.png

d) Set the nifi.remote.input.host to your NiFi hostname and nifi.remote.input.socket.port to 10000

9713-1-d.png

e) Restart NiFi using Ambari

9714-1-e.png

2) In NiFi – Create a flow for incoming log data (listening [input port] for MiNiFi data)

Reason: Listen for incoming log data and route the log data to an HDFS directory

a) On the NiFi Flow Canvas, drag-and-drop an Input port (Name your Input Port – I named mine “listen_for_minifi_logs”)

9715-2-a.png

b) Drag-and-drop processor RouteOnAttribute

c) Create a connection between Input Port and RouteOnAttribute

9716-2-c.png

d) Configure RouteOnAttribute - Properties (removing the red caution), adding two properties (one property per server you’re installing MiNiFi on)

9717-2-d.png

I have two servers – rcicak0.field.hortonworks.com and rcicak1.field.hortonworks.com, the incoming log data (flowfile) will contain an attribute called “host_name” where we’ll properly route the flowfile depending on the host_name property

e) Drag-and-drop three putHDFS processors & create a connection using hostname_rcicak0, hostname_rcicak1 and unmatched

9718-2-ef.png

f) Each putHDFS processor will have a different HDFS directory – configure properties for each putHDFS processor /tmp/rcicak0/, /tmp/rcicak1/ and /tmp/unmatched

9719-2-ef.png

g) Configure the HDFS directory – properties

9720-2-g.png

(adding a core-site.xml and the directory – depending on the connection)

h) Play the processors – at this point, the NiFi flow is ready to receive Log data from MiNiFi

9721-2-h.png

3) Setup MiNiFi on at least one server

Reason: MiNiFi needs to push the log data to a remote processing group and delete the log file

a) Download MiNiFi (from http://hortonworks.com/downloads/ ) on each of the servers (that contain log data)

9722-3-a.png

b) Unzip minifi-0.0.1-bin.zip to a directory

c) Complete step 4 below before continuing to d

d) Running with an account that has the read/write permission to the log data directory (to read the file and delete the file) run *location/minifi-0.0.1/bin/minifi.sh start

4) Using a processor group in NiFi, create a MiNiFi Flow (pushing log data to a remote processing group)

Reason: Push the log data to a remote processing group and delete the log file

a) Create a process group (call the group “minifi_flow”)

b) Go into the process group “minifi_flow"

c) Drag-and-drop the processor GetFile

d) Configure the processor GetFile – Properties (IMPORTANT: Any file matching the file filter’s regular expression under input directory [and the recursive subdirectories when set to true], the file will be deleted once the file is stored in MiNiFi’s content repository

9723-4-d.png

In the example above, the file filter looks for hdfs-audit.log.Archive -> in this case is a date

e) Drag-and-drop the UpdateAttribute processor and create a successful connection between GetFile and UpdateAttribute

f) Configure the UpdateAttribute processor – Properties, setting the host_name attribute adding the nifi expression language getting the hostname

9724-4-f.png

g) Drag-and-drop a remote process group – use the nifi.remote.input.host from above for the URL

9725-4-g.png

Wait for the connection to establish before continuing to h

h) Add a connection between UpdateAttribute and the Remote Process Group – under To Input choose listen_for_minifi_logs

9726-4-h.png

i) Select all processors and relationships to create a template (Download the template’s xml file)

9727-4-i.png

j) Use the minifi-toolkit (https://www.apache.org/dyn/closer.lua?path=/nifi/minifi/0.0.1/minifi-toolkit-0.0.1-bin.zip ) and run the config.sh tool “config.sh transform theminifi_flow_template.xml config.yml” -> which will convert the XML to a YML file that will be read by MiNiFi

k) Copy the config.yml file into the minifi-0.0.1/conf directory on each of the MiNiFi servers (if you already have your MiNiFi agent started, restart the agent)

13,444 Views
Comments
avatar
Contributor

@Ryan Cicak

I followed procedure, but i am getting the handshake Exception and I made false "nifi.remote.input.secure" still didnt help. Please help me did I miss anything.

2017-04-12 06:13:50,696 INFO [StandardProcessScheduler Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=616d3e3e-015b-1000-0000-000000000000] to run with 1 threads

2017-04-12 06:13:51,006 ERROR [Timer-Driven Process Thread-1] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:13:51,008 ERROR [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:13:51,020 INFO [NiFi Site-to-Site Connection Pool Maintenance] o.apache.nifi.remote.client.PeerSelector org.apache.nifi.remote.client.PeerSelector@1656d5bc Successfully refreshed Peer Status; remote instance consists of 1 peers 2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

2017-04-12 06:14:21,049 ERROR [Timer-Driven Process Thread-4] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode

Thanks

Chaitanya