Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to pass parameters into Flume Exec Source command

avatar
Expert Contributor

Hi -

 

I've checked the doc, read the O'Reilly book, Googled, and searched this forum, but did not find much useful for what looks like should be a common Flume use case:

 

I want to ingest log files of some distributed application that runs on multiple hosts.  They behave like typical Unix or web server logs - in fixed directories and roll infrequently.  I cannot modify the application nor the log files themselves - the ingestion has to be totally non-invasive.  So far so good:

 

1.  Current Flume documentation recommends Spooling Dir Source over Exec Source for tailing logs, yet does not explain how to do that in a streaming fashion without modifying source file.  Spooling Dir Source requires that the source file be completed for update and closed - it's batch- rather than stream-oriented.  So we can't use it for typical actively-updated log files.

 

2.  Now, using Exec Source should work, except that these log files don't have constant file names - e.g. app-<role>-<rack>.log.  The log directory is NFS-mounted and can be shared by multiple hosts, so it can contain:

 

     app-role1-rack1.log

     app-role2-rack1.log

     app-role1-rack2.log

     app-role2-rack2.log

     ....

 

Hadoop itself has such examples:

 

$ ls -l /var/log/hbase/
total 40836
-rw-r--r-- 1 hbase hbase   139032 Nov 20 17:40 hbase-cmf-hbase-HBASERESTSERVER-hou76072.log.out
-rw-r--r-- 1 hbase hbase 27859661 Feb 11 15:20 hbase-cmf-hbase-REGIONSERVER-hou76072.log.out

 

I would like to configure these values in flume-env.sh and pass it into the agent config file, or have the command itself call some script to derive dynamically.  Exec Source has a shell option that seems to support this:

 

The ‘shell’ config is used to invoke the ‘command’ through a command shell (such as Bash or Powershell). The ‘command’ is passed as an argument to ‘shell’ for execution. This allows the ‘command’ to use features from the shell such as wildcards, back ticks, pipes, loops, conditionals etc. In the absence of the ‘shell’ config, the ‘command’ will be invoked directly. Common values for ‘shell’ : ‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, etc.

a1.sources.tailsource-1.type = exec
a1.sources.tailsource-1.shell = /bin/bash -c
a1.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done

 

However, I could not get it to work:

 

(flume-env.sh)

ROLE=`<some script or command>`
RACK=`<some script or command>`
...
JAVA_OPTS="-Xms60m -Xmx360m -Drole=${ROLE} -Drack=${RACK} -Dhostname=${HOSTNAME} "

(flume_spool_avro_agent.conf)

....
spool_avro_agent.sources.s2.channels = c1
spool_avro_agent.sources.s2.type = exec
spool_avro_agent.sources.s2.shell = /bin/bash -c
spool_avro_agent.sources.s2.command = tail -F /var/log/app-${role}-${rack}.log
....

I verified that $JAVA_OPTS is correct, but the values don't seem to be passed to the command line:

 

2016-02-11 15:46:14,175 |INFO  |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[1] = '+ tail -F /var/log/app--.log'
2016-02-11 15:46:14,176 |INFO  |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[2] = 'tail: cannot open `/var/log/app--.log' for reading: No such file or directory'

 

So, what exactly is the runtime environment for this Exec Source shell?  What kind of constraints does it have (compared to, say, ssh)?

 

 

Any insights from the trenches would be appreciated.  Thanks!

 

 

1 ACCEPTED SOLUTION

avatar
The Exec source is called with the ProcessBuilder:
https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html

It inherits the environment of the current running flume process

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Exporting the variables in flume-env.sh seems to make them visible, and solves my immediate problem. 

Also, passing the variables in JAVA (JAVA_OPTS="... -Dvar=x") doesn't seem to make a difference in this case (but required if you want to use them in log4j.properties):

 

export ROLE=`<some script or command>`
export RACK=`<some script or command>`
... 
JAVA_OPTS="-Xms60m -Xmx360m -Dhostname=${HOSTNAME} "

My larger questions still stand, though, and I'd welcome any comments.

 

avatar
The Exec source is called with the ProcessBuilder:
https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html

It inherits the environment of the current running flume process