Dear Folks,
Note: It's an urgent requirement, your suggestions will be appreciated.
Here my requirement, it's a batch job.
For every run, I have to load five hive tables.
I created separate dataframe objects for all five tables and calling it inside the main function.
Here am using flags to get the user input and starts running the job.
Below code works fine for two tables.
If I pass the argument "all", five tables got loaded without issue.
sometimes on an ad-hoc basis, I may need to load two tables or three tables based on requirement.
How can I achieve in my code?
Eg:
For today run's, I need to load only three tables.
I passed the table names as arguments while submitting the job
spark-submit --class .. --master yarn eimreporting tableA tableB tableC
code:
----
object Medinsight_Main {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Eim_Reporting")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
try {
if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "all") {
eiminsight_claim_agg.Transform(sqlContext) --> calling TableA object
eiminsight_member.Transform(sqlContext) ---> calling TableB object
}
else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableA") {
eiminsight_claim_agg.Transform(sqlContext) ---> calling TableA object
}
else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableB") {
eiminsight_member.Transform(sqlContext) ---> calling TableB object
}
else {
System.out.println("No argumnts"
}
}
catch{
exception
}
finally{
sc.stop()
}