Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume with oozie

Solved Go to solution
Highlighted

Flume with oozie

Contributor

We are using Flume to get the data into HDFS.After that we are running pig, hive for data transformation.Not sure how to trigger flume from oozie?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Flume with oozie

Expert Contributor

@vamsi valiveti you can trigger Flume from the oozie shell action. However pay attention that action will be executed on random cluster node, so all your nodes should have Flume installed. Also you will need to somehow control the agents after that, and if you have >10 nodes it became a problem.. That's why is not common scenario of flume usage.

I'd say the good approach is to keep Flume running all the time. And schedule oozie jobs to process the data whenever you need.

8 REPLIES 8

Re: Flume with oozie

Expert Contributor

Hi @vamsi valiveti,

Oozie is a scheduler and Flume is not working on a schedule basis instead Flume is treating the data when it receives it. So you use teh Flume configuration to tell for example that each time there is a file in a certain directory Flume will put it in hdfs (if you use the spooldir source) and so on.

/Best regards, Mats

Re: Flume with oozie

Contributor

a)I am starting flume agent using below command.In production how we will trigger this command currently I am running manually on unix command prompt and also i want to create dependeny with hive?

b)can i place below command in unix shell script and call it in shell action in oozie?

flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf --name Agent7

Re: Flume with oozie

Contributor

Hi @Mats Johansson

Any input on my clarifications

Re: Flume with oozie

Expert Contributor

@vamsi valiveti you can trigger Flume from the oozie shell action. However pay attention that action will be executed on random cluster node, so all your nodes should have Flume installed. Also you will need to somehow control the agents after that, and if you have >10 nodes it became a problem.. That's why is not common scenario of flume usage.

I'd say the good approach is to keep Flume running all the time. And schedule oozie jobs to process the data whenever you need.

Re: Flume with oozie

Contributor

HI @Michael M

Thanks alot for your time.one small clarification

You mentioned good approach is to keep Flume running all the time. And schedule oozie jobs to process the data whenever you need.

clarification 1:-

How to keep Flume running all the time?currently i am using below command on my gateway node.

flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf --name Agent7

Re: Flume with oozie

Expert Contributor

@vamsi valiveti the easiest way is to detach shell from the command using nohup:

nohup <my_command> &

Another option is to create flume init.d service script. I've posted some example script here (search for "Setup flume agent auto startup" on the page), and run the flume as a service.

And third option is to use Ambari to control the agents.

Re: Flume with oozie

Contributor

HI @Michael M

For first option:-

In production can I place below command in shell script and schedule that script using crontab so that it will run the Flume will run continuously since In production environment we are not allowed to run any command manually on gateway node.Please correct me if i am wrong?

nohup <my_command> &

Re: Flume with oozie

Expert Contributor

@vamsi valiveti it could be the option, right.

But for production usage i'd think additionally about how to stop the agents and how to monitor the agent. From my experience init.d service script + ganglia monitoring is a best option.

It allows you to run/stop agents easily with the commands like: /etc/init.d/flume "agent" stop/start. And ganglia provides a nice web interface for the monitoring.

Don't have an account?
Coming from Hortonworks? Activate your account here