Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When Spark call Hive from oozie, exception raised “java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException”

When Spark call Hive from oozie, exception raised “java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException”

New Contributor

I have spark job that save data to hdfs then it save the same data to Hive table. When I run it on Jupyter, it run succesfully. But when I run it through oozie It raises the folloing exception when it reaches the step of writing data to hive. Here is my code followed by the exception:

# coding: utf-8

# In[10]:


import os

JARS_HOME = "hdfs:///dataengineering/jars"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars '+JARS_HOME+'/ojdbc6.jar,'+JARS_HOME+'/anonymize_udfs.jar pyspark-shell'

os.environ["HADOOP_CONF_DIR"] = '/etc/hadoop/conf'


# In[11]:


try:
    from pyspark import SparkContext, SQLContext
    from pyspark.sql import SparkSession
except:
    import findspark
    findspark.init('/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/spark')
    from pyspark import SparkContext, SQLContext
    from pyspark.sql import SparkSession

import sys
import pyspark.sql.functions as functions
from datetime import date
from dateutil.relativedelta import relativedelta
from datetime import datetime  
from datetime import timedelta  
from pyspark.sql.types import StringType
from pyspark.sql.functions import * 
from pyspark.sql import functions as sf
from pyspark.sql.types import StringType


# In[12]:


spark = SparkSession.builder     .master("yarn")     .appName("oozie_sample_spark")     .config('spark.executor.cores','3')     .config('spark.executor.memory','15g')     .config('spark.driver.memory','5g')     .config('spark.driver.maxResultSize','12g')    .config("spark.dynamicAllocation.enabled", "true")    .config("spark.shuffle.service.enabled", "true")    .config("spark.executor.instances", "4")    .config("spark.yarn.queue", "root.STREAMING")    .config("spark.dynamicAllocation.cachedExecutorIdleTimeout", "300s")    .config("hive.metastore.uris", "thrift://dchqmaster01.internal.eg.vodafone.com:9083")    .getOrCreate()


# In[13]:


spark.sql("select current_timestamp() column_a").write.csv("/user/akhamis11/oozie-samples/spark-sample/current_column.csv", mode='append')


# In[ ]:


spark.sql("select current_timestamp() column_a").write.saveAsTable("bde.oozie_test", mode='append')


# In[6]:


spark.stop()

> 2020-04-13 13:58:31,786 [Thread-10] INFO 
> com.cloudera.spark.lineage.NavigatorQueryListener  - Failed to
> generate lineage for successful query execution.
> java.lang.IllegalArgumentException: Error while instantiating
> 'org.apache.spark.sql.hive.HiveExternalCatalog':  at
> .
> .
> .

org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:189) **> ... 39 more Caused by: java.lang.NoClassDefFoundError:

org/apache/hadoop/hive/ql/metadata/HiveException at org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:73) ... 44 more Caused by: java.lang.ClassNotFoundException:** org.apache.hadoop.hive.ql.metadata.HiveException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 45 more

1 REPLY 1
Highlighted

Re: When Spark call Hive from oozie, exception raised “java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException”

Cloudera Employee

Hi,

 

java.lang.NoClassDefFoundError refers to Jar file is missing. Could you please make sure that the related Jars has been included in your classpath where the other jar files has been included

 

Thanks

Arun

Don't have an account?
Coming from Hortonworks? Activate your account here