<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to pass parameters into Flume Exec Source command in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37347#M19175</link>
    <description>&lt;P&gt;Hi -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've checked the doc, read the O'Reilly book, Googled, and searched this forum, but did not find much useful for what looks like should be a common Flume use case:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to ingest log files of some distributed application that runs on multiple hosts.&amp;nbsp; They behave like typical Unix or web server logs - in fixed directories and roll infrequently.&amp;nbsp; I cannot modify the application nor the log files themselves - the ingestion has to be totally non-invasive.&amp;nbsp; So far so good:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp; &lt;A href="http://flume.apache.org/FlumeUserGuide.html#exec-source" target="_self"&gt;Current Flume documentation&lt;/A&gt; recommends &lt;STRONG&gt;Spooling Dir Source&lt;/STRONG&gt; over Exec Source for tailing logs, yet does not explain how to do that in a streaming fashion without modifying source file.&amp;nbsp; Spooling Dir Source requires that the source file be completed for update and closed - it's batch- rather than stream-oriented.&amp;nbsp; So we can't use it for typical actively-updated log files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp; Now, using &lt;STRONG&gt;Exec Source&lt;/STRONG&gt; should work, except that these log files don't have constant file names - e.g. app-&lt;EM&gt;&amp;lt;role&amp;gt;&lt;/EM&gt;-&lt;EM&gt;&amp;lt;rack&amp;gt;&lt;/EM&gt;.log.&amp;nbsp; The log directory is NFS-mounted and can be shared by multiple hosts, so it can contain:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role1-rack1.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role2-rack1.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role1-rack2.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role2-rack2.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hadoop itself has such examples:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;$ ls -l /var/log/hbase/&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;total 40836&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;-rw-r--r-- 1 hbase hbase&amp;nbsp;&amp;nbsp; 139032 Nov 20 17:40 hbase-cmf-hbase-HBASERESTSERVER-hou76072.log.out&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;-rw-r--r-- 1 hbase hbase 27859661 Feb 11 15:20 hbase-cmf-hbase-REGIONSERVER-hou76072.log.out&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to configure these values in flume-env.sh and pass it into the agent config file, or have the command itself call some script to derive dynamically.&amp;nbsp; Exec Source has a shell option that seems to support this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="times new roman,times"&gt;The ‘shell’ config is used to invoke the ‘command’ through a command shell (such as Bash or Powershell). The ‘command’ is passed as an argument to ‘shell’ for execution. This allows the ‘command’ to use features from the shell such as wildcards, back ticks, pipes, loops, conditionals etc. In the absence of the ‘shell’ config, the ‘command’ will be invoked directly. Common values for ‘shell’ : ‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, etc.&lt;/FONT&gt;&lt;/P&gt;&lt;DIV class="highlight-properties"&gt;&lt;DIV class="highlight"&gt;&lt;PRE&gt;&lt;SPAN class="na"&gt;a1.sources.tailsource-1.type&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;exec&lt;/SPAN&gt;
&lt;SPAN class="na"&gt;a1.sources.tailsource-1.shell&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;/bin/bash -c&lt;/SPAN&gt;
&lt;SPAN class="na"&gt;a1.sources.tailsource-1.command&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;for i in /path/*.txt; do cat $i; done&lt;/SPAN&gt;
&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I could not get it to work:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(flume-env.sh)&lt;/P&gt;&lt;PRE&gt;ROLE=`&amp;lt;some script or command&amp;gt;`&lt;BR /&gt;RACK=`&amp;lt;some script or command&amp;gt;`&lt;BR /&gt;... &lt;BR /&gt;JAVA_OPTS="-Xms60m -Xmx360m -Drole=${ROLE} -Drack=${RACK} -Dhostname=${HOSTNAME} "&lt;/PRE&gt;&lt;P&gt;(flume_spool_avro_agent.conf)&lt;/P&gt;&lt;PRE&gt;....
spool_avro_agent.sources.s2.channels = c1
spool_avro_agent.sources.s2.type = exec
spool_avro_agent.sources.s2.shell = /bin/bash -c
spool_avro_agent.sources.s2.command = tail -F /var/log/app-${role}-${rack}.log
....&lt;/PRE&gt;&lt;P&gt;I verified that $JAVA_OPTS is correct, but the values don't seem to be passed to the command line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="3"&gt;2016-02-11 15:46:14,175 |INFO&amp;nbsp; |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[1] = '+ tail -F &lt;STRONG&gt;/var/log/app--.log&lt;/STRONG&gt;'&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="3"&gt;2016-02-11 15:46:14,176 |INFO&amp;nbsp; |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[2] = 'tail: cannot open `&lt;STRONG&gt;/var/log/app--.log&lt;/STRONG&gt;' for reading: No such file or directory'&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, what exactly is the runtime environment for this Exec Source shell?&amp;nbsp; What kind of constraints does it have (compared to, say, ssh)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any insights from the trenches would be appreciated.&amp;nbsp; Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:03:42 GMT</pubDate>
    <dc:creator>MilesYao</dc:creator>
    <dc:date>2022-09-16T10:03:42Z</dc:date>
    <item>
      <title>How to pass parameters into Flume Exec Source command</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37347#M19175</link>
      <description>&lt;P&gt;Hi -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've checked the doc, read the O'Reilly book, Googled, and searched this forum, but did not find much useful for what looks like should be a common Flume use case:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to ingest log files of some distributed application that runs on multiple hosts.&amp;nbsp; They behave like typical Unix or web server logs - in fixed directories and roll infrequently.&amp;nbsp; I cannot modify the application nor the log files themselves - the ingestion has to be totally non-invasive.&amp;nbsp; So far so good:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp; &lt;A href="http://flume.apache.org/FlumeUserGuide.html#exec-source" target="_self"&gt;Current Flume documentation&lt;/A&gt; recommends &lt;STRONG&gt;Spooling Dir Source&lt;/STRONG&gt; over Exec Source for tailing logs, yet does not explain how to do that in a streaming fashion without modifying source file.&amp;nbsp; Spooling Dir Source requires that the source file be completed for update and closed - it's batch- rather than stream-oriented.&amp;nbsp; So we can't use it for typical actively-updated log files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp; Now, using &lt;STRONG&gt;Exec Source&lt;/STRONG&gt; should work, except that these log files don't have constant file names - e.g. app-&lt;EM&gt;&amp;lt;role&amp;gt;&lt;/EM&gt;-&lt;EM&gt;&amp;lt;rack&amp;gt;&lt;/EM&gt;.log.&amp;nbsp; The log directory is NFS-mounted and can be shared by multiple hosts, so it can contain:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role1-rack1.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role2-rack1.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role1-rack2.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app-role2-rack2.log&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hadoop itself has such examples:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;$ ls -l /var/log/hbase/&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;total 40836&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;-rw-r--r-- 1 hbase hbase&amp;nbsp;&amp;nbsp; 139032 Nov 20 17:40 hbase-cmf-hbase-HBASERESTSERVER-hou76072.log.out&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;-rw-r--r-- 1 hbase hbase 27859661 Feb 11 15:20 hbase-cmf-hbase-REGIONSERVER-hou76072.log.out&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to configure these values in flume-env.sh and pass it into the agent config file, or have the command itself call some script to derive dynamically.&amp;nbsp; Exec Source has a shell option that seems to support this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="times new roman,times"&gt;The ‘shell’ config is used to invoke the ‘command’ through a command shell (such as Bash or Powershell). The ‘command’ is passed as an argument to ‘shell’ for execution. This allows the ‘command’ to use features from the shell such as wildcards, back ticks, pipes, loops, conditionals etc. In the absence of the ‘shell’ config, the ‘command’ will be invoked directly. Common values for ‘shell’ : ‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, etc.&lt;/FONT&gt;&lt;/P&gt;&lt;DIV class="highlight-properties"&gt;&lt;DIV class="highlight"&gt;&lt;PRE&gt;&lt;SPAN class="na"&gt;a1.sources.tailsource-1.type&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;exec&lt;/SPAN&gt;
&lt;SPAN class="na"&gt;a1.sources.tailsource-1.shell&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;/bin/bash -c&lt;/SPAN&gt;
&lt;SPAN class="na"&gt;a1.sources.tailsource-1.command&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s"&gt;for i in /path/*.txt; do cat $i; done&lt;/SPAN&gt;
&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I could not get it to work:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(flume-env.sh)&lt;/P&gt;&lt;PRE&gt;ROLE=`&amp;lt;some script or command&amp;gt;`&lt;BR /&gt;RACK=`&amp;lt;some script or command&amp;gt;`&lt;BR /&gt;... &lt;BR /&gt;JAVA_OPTS="-Xms60m -Xmx360m -Drole=${ROLE} -Drack=${RACK} -Dhostname=${HOSTNAME} "&lt;/PRE&gt;&lt;P&gt;(flume_spool_avro_agent.conf)&lt;/P&gt;&lt;PRE&gt;....
spool_avro_agent.sources.s2.channels = c1
spool_avro_agent.sources.s2.type = exec
spool_avro_agent.sources.s2.shell = /bin/bash -c
spool_avro_agent.sources.s2.command = tail -F /var/log/app-${role}-${rack}.log
....&lt;/PRE&gt;&lt;P&gt;I verified that $JAVA_OPTS is correct, but the values don't seem to be passed to the command line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="3"&gt;2016-02-11 15:46:14,175 |INFO&amp;nbsp; |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[1] = '+ tail -F &lt;STRONG&gt;/var/log/app--.log&lt;/STRONG&gt;'&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="3"&gt;2016-02-11 15:46:14,176 |INFO&amp;nbsp; |org.apache.flume.source.ExecSource$StderrReader | |- StderrLogger[2] = 'tail: cannot open `&lt;STRONG&gt;/var/log/app--.log&lt;/STRONG&gt;' for reading: No such file or directory'&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, what exactly is the runtime environment for this Exec Source shell?&amp;nbsp; What kind of constraints does it have (compared to, say, ssh)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any insights from the trenches would be appreciated.&amp;nbsp; Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:03:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37347#M19175</guid>
      <dc:creator>MilesYao</dc:creator>
      <dc:date>2022-09-16T10:03:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to pass parameters into Flume Exec Source command</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37351#M19176</link>
      <description>&lt;P&gt;Exporting the variables in flume-env.sh seems to make them visible, and solves my immediate problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, passing the variables in JAVA (&lt;FONT face="courier new,courier"&gt;JAVA_OPTS="... -D&lt;EM&gt;var=x&lt;/EM&gt;"&lt;/FONT&gt;) doesn't seem to make a difference in this case (but required if you want to use them in log4j.properties):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;export ROLE=`&amp;lt;some script or command&amp;gt;`
export RACK=`&amp;lt;some script or command&amp;gt;`
... 
JAVA_OPTS="-Xms60m -Xmx360m -Dhostname=${HOSTNAME} "&lt;/PRE&gt;&lt;P&gt;My larger questions still stand, though, and I'd welcome any comments.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 22:31:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37351#M19176</guid>
      <dc:creator>MilesYao</dc:creator>
      <dc:date>2016-02-11T22:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to pass parameters into Flume Exec Source command</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37353#M19177</link>
      <description>The Exec source is called with the ProcessBuilder:&lt;BR /&gt;&lt;A href="https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html" target="_blank"&gt;https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;It inherits the environment of the current running flume process</description>
      <pubDate>Fri, 12 Feb 2016 00:02:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-pass-parameters-into-Flume-Exec-Source-command/m-p/37353#M19177</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2016-02-12T00:02:40Z</dc:date>
    </item>
  </channel>
</rss>

