Member since
04-06-2020
3
Posts
0
Kudos Received
0
Solutions
04-06-2020
11:39 AM
Thank you so much for helping! You helped a lot.
... View more
04-06-2020
10:39 AM
Thank you for your answer! I need to use sudo -u hdfs because the comparison of those two tables are stored in a third table in HDFS, and for that i need write permission. Also, if i pass those variables using export, do I need to declare the variable inside the .conf file besides the run.sh? And does this work inside the SQL? For example, one of my variables is a primaryKey field. I'm comparing A.${primaryKey} = B.${primaryKey}, but the comparison doesnt give any results. Just point an error in the SQL: "A. = B."
... View more
04-06-2020
01:33 AM
Hello!
In my Envelope pipeline, I need to compare two Hive tables. Instead of hardcoding the tables in the .conf file, I would like to pass which tables I'm going to compare. I tried using spark.yarn.appMaster.varName but it doesn't seem to work. I'm running CDH 5.13.3 with Java 1.8 on a Centos VM.
This is what the script that runs the spark job looks like:
#!bin/bash
sudo -u hdfs spark2-submit \
--master yarn \
--deploy-mode client \
--conf spark.yarn.appMaster.Env.tableA=dbA.tableA \
--conf spark.yarn.appMaster.Env.tableB=dbB.tableB \
envelope-0.7.2.jar comparison.conf
Part of my .conf file:
application{name = comparison}
steps{
tableA{
type = hive
table = ${tableA}
}
tableB{
type = hive
table = ${tableB}}
}
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark