Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

​Flume in Production - To Ambari or not to Ambari, that is the question!

avatar
Expert Contributor

I need to find out what the best practice is for running a set of flume agents in production. All the answers I find dance around the issue. I am clear that when setting up in Ambari, you can create a number of config groups for Flume, and each agent needs to be concatenated into the flume.conf for that group. So, each agent runs 1 instance on each host associated with the configuration group.

At this point, you can see and restart individual agents through Ambari. However (and here’s the problem), if you make a change to any of the agents configuration or add a new one then you need to restart ALL of the agents in that group for the change to take effect! Not acceptable in my case where I have 4 apps running 2 or 3 agents each. It certainly does not seem to be acceptable to have to restart all applications flume agents whenever a change is made!

So, am I missing something or are large enterprises simply using shell scripts to start the agent on each host?

If they are using script, then what is being used for monitoring and auto-restart?

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar

@jbarnett

I say not Flume 🙂

have you tried NiFi ? You can can have several processors for your app, configure each one of them with some click in GUI !! you want re-configure a particular processor, no problem !! stop it, right click, configure it and run it again.

If you really want to use Flume, I recommend using a config file per agent as stated in the doc :

Hortonworks recommends that administrators use a separate configuration file for each Flume agent. .... While it is possible to use one large configuration file that specifies all the Flume components needed by all the agents, this is not typical of most production deployments. 

Since you have several agents in the same host, Ambari is not an option

Use NiFi !!

avatar
Expert Contributor

@Abdelkrim Hadjidj

So, I AM trying to get the powers to be to switch over to NiFi, but in the mean time we have a short time frame to port what they have with as little changes as possible.

Under Starting Flume The document also shows starting Flume from the command line. In this scenario, you could put each one in a separate config file. I am just wondering if this is how most large enterprises are running in production. And, if so, how they are monitoring them.

BTW, I had accidentally posted this an answer, so not sure if everyone saw it.

avatar
Expert Contributor

Hi Jim! 🙂

Our project is still around and getting bigger. We are using both Cloudera and Hortonworks and building more dataflows. With increased complexity, we are finding that Ambari more and more inadequate compared to Cloudera's full-featured commercial counterpart, Cloudera Manager. For Flume, there are only six metrics, four basic config attributes, and one big textbox for pasting in the config file. I have to hand-edit flume-env.sh to change the agent heap allocation.

(With apology to our hosts) While Hortonworks offers a goodie bag of latest Apache applications, the primitive state of the management console is a deal-breaker. If Ambari cannot be improved soon, I strongly recommend you consider Cloudera (we are using the free version).

avatar
Master Mentor

Limitation on the Flume management is absolutely there but we make up for it with our NiFi support.

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

This was back in 2016, nowadays I would go for Nifi (open source) or StreamSets (free to use, pay for support)

Flume is deprecated in Hortonworks now and will be removed from in future releases 3.*: deprecations_HDP.