<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do i pass variables to spark job using Envelope in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293351#M216651</link>
    <description>&lt;P&gt;You just need to use local environment variables since you are running in client mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;export tableA=dbA.tableA
export tableB=dbB.tableB

spark2-submit \
--master yarn \
--deploy-mode client \
envelope-0.7.2.jar comparison.conf &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For sudo you would need to use -E to pass the variables through, but it is not good practice to run jobs as the HDFS superuser instead of your own user.&lt;/P&gt;</description>
    <pubDate>Mon, 06 Apr 2020 13:05:00 GMT</pubDate>
    <dc:creator>Jeremy Beard</dc:creator>
    <dc:date>2020-04-06T13:05:00Z</dc:date>
    <item>
      <title>How do i pass variables to spark job using Envelope</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293333#M216638</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In my Envelope pipeline, I need to compare two Hive tables. Instead of hardcoding the tables in the .conf file, I would like to pass which tables I'm going to compare. I tried using spark.yarn.appMaster.varName but it doesn't seem to work. I'm running CDH 5.13.3 with Java 1.8 on a Centos VM.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is what the script that runs the spark job looks like:&lt;/P&gt;
&lt;P&gt;#!bin/bash&lt;/P&gt;
&lt;P&gt;sudo -u hdfs spark2-submit \&lt;/P&gt;
&lt;P&gt;--master yarn \&lt;/P&gt;
&lt;P&gt;--deploy-mode client \&lt;/P&gt;
&lt;P&gt;--conf spark.yarn.appMaster.Env.tableA=dbA.tableA \&lt;/P&gt;
&lt;P&gt;--conf spark.yarn.appMaster.Env.tableB=dbB.tableB \&lt;/P&gt;
&lt;P&gt;envelope-0.7.2.jar comparison.conf&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Part of my .conf file:&lt;/P&gt;
&lt;P&gt;application{name = comparison}&lt;/P&gt;
&lt;P&gt;steps{&lt;/P&gt;
&lt;P&gt;tableA{&lt;/P&gt;
&lt;P&gt;type = hive&lt;/P&gt;
&lt;P&gt;table = ${tableA}&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;tableB{&lt;/P&gt;
&lt;P&gt;type = hive&lt;/P&gt;
&lt;P&gt;table = ${tableB}}&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 09:28:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293333#M216638</guid>
      <dc:creator>Cayo</dc:creator>
      <dc:date>2020-04-06T09:28:34Z</dc:date>
    </item>
    <item>
      <title>Re: How do i pass variables to spark job using Envelope</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293351#M216651</link>
      <description>&lt;P&gt;You just need to use local environment variables since you are running in client mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;export tableA=dbA.tableA
export tableB=dbB.tableB

spark2-submit \
--master yarn \
--deploy-mode client \
envelope-0.7.2.jar comparison.conf &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For sudo you would need to use -E to pass the variables through, but it is not good practice to run jobs as the HDFS superuser instead of your own user.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 13:05:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293351#M216651</guid>
      <dc:creator>Jeremy Beard</dc:creator>
      <dc:date>2020-04-06T13:05:00Z</dc:date>
    </item>
    <item>
      <title>Re: How do i pass variables to spark job using Envelope</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293368#M216660</link>
      <description>&lt;P&gt;Thank you for your answer!&lt;/P&gt;&lt;P&gt;I need to use sudo -u hdfs because the comparison of those two tables are stored in a third table in HDFS, and for that i need write permission. Also, if i pass those variables using export, do I need to declare the variable inside the .conf file besides the run.sh? And does this work inside the SQL? For example, one of my variables is a primaryKey field. I'm comparing A.${primaryKey} = B.${primaryKey}, but the comparison doesnt give any results. Just point an error in the SQL: "A. = B."&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 17:39:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293368#M216660</guid>
      <dc:creator>Cayo</dc:creator>
      <dc:date>2020-04-06T17:39:21Z</dc:date>
    </item>
    <item>
      <title>Re: How do i pass variables to spark job using Envelope</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293369#M216661</link>
      <description>&lt;P&gt;You don't need to declare them in the conf file, but for environment variables you can't have them inside the SQL string because of the way the file format handles variable substitution. Concatenation with variables is reasonably easy though, for example:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;"SELECT * FROM tableA A INNER JOIN tableB B ON A."${primaryKey}" = B."${primaryKey}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 17:47:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293369#M216661</guid>
      <dc:creator>Jeremy Beard</dc:creator>
      <dc:date>2020-04-06T17:47:36Z</dc:date>
    </item>
    <item>
      <title>Re: How do i pass variables to spark job using Envelope</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293371#M216663</link>
      <description>&lt;P&gt;Thank you so much for helping! You helped a lot.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 18:39:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-do-i-pass-variables-to-spark-job-using-Envelope/m-p/293371#M216663</guid>
      <dc:creator>Cayo</dc:creator>
      <dc:date>2020-04-06T18:39:02Z</dc:date>
    </item>
  </channel>
</rss>

