I have below questions related to Flume,
Thank you for solution.
In my case, I am reading logs from webserver and dumping in HDFS.
Currently I am running agent on web server and edge node (this node is not part of cluster but all clients installed on it, so I can run flume agent here by manual flume-ng command) to push data to HDFS.
What is difference in running Flume on edge node (like I am currently running) and running Flume on one of cluster node (as you suggested) ?
Also I don’t know where to find the start and stop script, do I need to write my own ?
We are using CDH - 5.3.3 and Flume 1.5.0
Any help appreciated
If your edge node is part of the cluster, and you are using parcels, then you won't have start and stop scripts, and the recommended method to run flume is by setting up a flume service in CM to run on the edge node.
The only difference between an edge node and a cluster node, is that the edge nodes generally don't run hadoop services.
Have you installed the flume rpms on this edge node or are you using parcels? Where are you running the flume-ng command from:
which flume-ng alternatives --display flume-ng
I guess Flume installed using parcels. I am running Flume-ng commands on edge node.
Below are details,
[@ ~]$ which flume-ng
[@ ~]$ alternatives --display flume-ng
flume-ng - status is auto.
link currently points to /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/fl ume-ng
/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/flume-ng - priority 10
Current `best' version is /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/fl ume-ng.
Also your will be very helpfull if provide details about setting up a flume service in CM.