Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume agent on edge node

avatar
Contributor

Hi,

I have below questions related to Flume,

  1. On which node should Flume agent run ? On Edge node or one of Hadoop cluster node ?
  2. Do I need to run Flume agent using nohup in production as it may keep running until interrupted
1 ACCEPTED SOLUTION

avatar
Here is info on setting up a flume service in CM:
http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_flume_service.html

You can have multiple flume services within a CM cluster. Each configuration would be separate.

-pd

View solution in original post

7 REPLIES 7

avatar
If you are using flume to deliver to hdfs, it is recommended to have that flume agent run on a node in your cluster.

If you are using flume to collect events from other applications and send downstream to another agent which then delivers to its final destination (hdfs, solr, etc), then you can run that agent on a cluster node, or on the machine where the events are being generated.

If it is not running on a CDH node, you can use packages to install flume, and then use the stop and start scripts to start it and keep it running as a daemon.

-pd

avatar
Contributor

Thank you for solution.

In my case, I am reading logs from webserver and dumping in HDFS.

Currently I am running agent on web server and edge node (this node is not part of cluster but all clients installed on it, so I can run flume agent here by manual flume-ng command) to push data to HDFS.

 

What is difference in running Flume on edge node (like I am currently running) and running Flume on one of cluster node (as you suggested) ?

Also I don’t know where to find the start and stop script, do I need to write my own ?

We are using CDH - 5.3.3 and Flume 1.5.0

 

Any help appreciated

avatar

If your edge node is part of the cluster, and you are using parcels, then you won't have start and stop scripts, and the recommended method to run flume is by setting up a flume service in CM to run on the edge node.

The only difference between an edge node and a cluster node, is that the edge nodes generally don't run hadoop services.

Have you installed the flume rpms on this edge node or are you using parcels? Where are you running the flume-ng command from:

 

which flume-ng
alternatives --display flume-ng

 

-pd

avatar
Contributor

I guess Flume installed using parcels. I am running Flume-ng commands on edge node.

Below are details,

[@ ~]$ which flume-ng
/usr/bin/flume-ng
[@ ~]$ alternatives --display flume-ng
flume-ng - status is auto.
 link currently points to /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/fl                                                                                        ume-ng
/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/flume-ng - priority 10
Current `best' version is /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/bin/fl                                                                                        ume-ng.

 

Also your will be very helpfull if provide details about setting up a flume service in CM.

 

Thank you

avatar
Contributor

I can see Flume running on CM portal, it means we already have Flume as service on Cloudera Manager.

avatar
Here is info on setting up a flume service in CM:
http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_flume_service.html

You can have multiple flume services within a CM cluster. Each configuration would be separate.

-pd

avatar
Contributor

Thank you for detail reply.

I have initiated Flume as service on Edge node and its as expected.