Created 12-14-2017 09:56 PM
Hello all,
I want to ingest data logs from application server logs into HDFS using Flume version 1.5 . Do I need to install Flume agent (client) on these application servers? How can I pull these application logs without install Flume agent? However, these servers are not part of Hadoop cluster. Can you please help?
Thanks
JN
Created 12-14-2017 11:00 PM
Hi @JT Ng,
Yes, That is possible with "Netcat TCP Source" by not installing the agent on application server. however you may need to tail the log and pass on to the listener from the server where you want to feed the logs.
which means start the log push process on the source server with (on the appliaction server)
tail -f <application Log file>.log |nc <flume_agent_host> <configured_netcat_Sync_port>
before you trigger this make sure that you initiated the fulme agent in the HDP cluster(or where the flume agent can be installed)
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = netcat a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 6666 a1.sources.r1.channels = c1
on the other side you can configure the HDFS Sync to pump this HDFS file system with the following command
a1.channels = c1 a1.sinks = k1 a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute
NB : Make sure that you handle the tali and nc process while your server stops or completely shuts down your application, however you can manage the tail process with proper shell includes restartability as a service in the linux host.
Hope this helps !!
Created 12-14-2017 11:00 PM
Hi @JT Ng,
Yes, That is possible with "Netcat TCP Source" by not installing the agent on application server. however you may need to tail the log and pass on to the listener from the server where you want to feed the logs.
which means start the log push process on the source server with (on the appliaction server)
tail -f <application Log file>.log |nc <flume_agent_host> <configured_netcat_Sync_port>
before you trigger this make sure that you initiated the fulme agent in the HDP cluster(or where the flume agent can be installed)
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = netcat a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 6666 a1.sources.r1.channels = c1
on the other side you can configure the HDFS Sync to pump this HDFS file system with the following command
a1.channels = c1 a1.sinks = k1 a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute
NB : Make sure that you handle the tali and nc process while your server stops or completely shuts down your application, however you can manage the tail process with proper shell includes restartability as a service in the linux host.
Hope this helps !!
Created 12-15-2017 06:51 PM
Thank you bkosaraju