Hi,
I am getting the following exception while try to execute a Spark Action through Oozie. I am able to run the same successfuly via Spark Shell but fails when executed via Oozie. I have also given my code below: for your reference
Traceback (most recent call last):
File "/data/0/yarn/nm/usercache/mvalakum/appcache/application_1534259325465_63027/container_e547_153425932...", line 26, in <module>
.mode("overwrite") \
File "/data/0/yarn/nm/usercache/mvalakum/appcache/application_1534259325465_63027/container_e547_153425932...", line 395, in save
File "/data/0/yarn/nm/usercache/mvalakum/appcache/application_1534259325465_63027/container_e547_153425932...", line 813, in __call__
File "/data/0/yarn/nm/usercache/mvalakum/appcache/application_1534259325465_63027/container_e547_153425932...", line 45, in deco
File "/data/0/yarn/nm/usercache/mvalakum/appcache/application_1534259325465_63027/container_e547_153425932...", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.save.
: java.lang.SecurityException: class "com.amazonaws.auth.InstanceProfileCredentialsProvider"'s signer information does not match signer information of other classes in the same package
at java.lang.ClassLoader.checkCerts(ClassLoader.java:898)
at java.lang.ClassLoader.preDefineClass(ClassLoader.java:668)
at java.lang.ClassLoader.defineClass(ClassLoader.java:761)
hiveDL = sqlContext.sql("select * from monthly_sales")
hiveDL.count()
hiveDL.write \
.format("com.databricks.spark.redshift") \
.option("url", "jdbc:redshift://<url>:5439/<db>?user=<username>;password=<pwd>") \
.option("dbtable", "monthly_sales") \
.option("tempdir", "s3a://<bcuket>/temp") \
.mode("overwrite") \
.save()
The code reads data from a Hive table and writes it into Redshift but howevrer it seems to fail beccause of a class being referred from a different jar. I not sure if it is missing a jar or pointing to a wrong reference.
I am using CDH 2.6.0-cdh5.10.0
Spark 1.6
Any help is appreicated
Thanks & Regards
Mukund