Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to run Spark Job in clutser mode

Solved Go to solution

Unable to run Spark Job in clutser mode

Hi,

I'm able to run the job in client mode but unable to run the same job in cluster mode. Can someone please help me? Below is the error message.

16/06/22 21:57:10 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:117)
        at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:165)
        at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:163)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:170)
        at DisplayAnalysisForecast$.main(DisplayAnalysisForecast.scala:35)
        at DisplayAnalysisForecast.main(DisplayAnalysisForecast.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:486)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
        ... 11 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
        ... 16 more
Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
        at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
        ... 40 more
16/06/22 21:57:10 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient)
16/06/22 21:57:18 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
16/06/22 21:57:18 INFO spark.SparkContext: Invoking stop() from shutdown hook


Using Spark 1.4 and running on Hadoop cluster only. Any help is highly appreciated and thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Unable to run Spark Job in clutser mode

Suggest solution:

1. As discovered, we are hitting a bug with HDP 2.3.2 with Ambari 2.2.1 :https://hortonworks.jira.com/browse/BUG-56393 where starting from Ambari 2.2.1 , it does not manage the spark version if HDP stack is < HDP 2.3.4

spark.driver.extraJavaOptions =-Dhdp.version={{hdp_full_version}}
spark.yarn.am.extraJavaOptions=-Dhdp.version={{hdp_full_version}}

2. Right now we have a workaround where we have modified the property value and hard coded the right HDP version.

spark.driver.extraJavaOptions =-Dhdp.version=2.3.2.0-2950
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.2.0-2950

Currently, Spark pi jobs are running fine

8 REPLIES 8

Re: Unable to run Spark Job in clutser mode

Rising Star
  1. Check the hive-site.xml contents. Should be like as below for spark.
  2. Add hive-site.xml to the driver-classpath so that spark can read hive configuration. Make sure —files must come before you .jar file
  3. Add the datanucleus jars using --jars option when you submit
  4. Check the contents of hive-site.xml
  5.  <configuration>
        <property>
          <name>hive.metastore.uris</name>
          <value>thrift://sandbox.hortonworks.com:9083</value>
        </property>
      </configuration>
  6. The Seq. of command
  1. spark-submit \
  2. --class <Your.class.name> \
  3. --master yarn-cluster \
  4. --num-executors 1 \
  5. --driver-memory 1g \
  6. --executor-memory 1g \
  7. --executor-cores 1 \
  8. --files /usr/hdp/current/spark-client/conf/hive-site.xml \
  9. --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar \
  10. target/YOUR_JAR-1.0.0-SNAPSHOT.jar "show tables""select * from your_table"

Re: Unable to run Spark Job in clutser mode

@Gangadhar Kadam

Thanks for the quick response, HA is enabled for HiveServer2 and we are pointing to two thrift servers at hive.metastore.uris

I have already followed all the steps, but that doesn't help my scenario. I'm able to run the same job in client mode.

Re: Unable to run Spark Job in clutser mode

Rising Star

can you share your code.

Re: Unable to run Spark Job in clutser mode

@Gangadhar Kadam

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 1 -driver-memory 2G --executor-memory 2G --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar  --files /usr/hdp/current/spark-client/conf/hive-site.xml /usr/hdp/current/spark-client/lib/spark-examples-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar 10
Highlighted

Re: Unable to run Spark Job in clutser mode

Rising Star

did you tried --file before the --jar?

Re: Unable to run Spark Job in clutser mode

@Gangadhar Kadam

Yes, this time I got different error,

ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.

Re: Unable to run Spark Job in clutser mode

Suggest solution:

1. As discovered, we are hitting a bug with HDP 2.3.2 with Ambari 2.2.1 :https://hortonworks.jira.com/browse/BUG-56393 where starting from Ambari 2.2.1 , it does not manage the spark version if HDP stack is < HDP 2.3.4

spark.driver.extraJavaOptions =-Dhdp.version={{hdp_full_version}}
spark.yarn.am.extraJavaOptions=-Dhdp.version={{hdp_full_version}}

2. Right now we have a workaround where we have modified the property value and hard coded the right HDP version.

spark.driver.extraJavaOptions =-Dhdp.version=2.3.2.0-2950
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.2.0-2950

Currently, Spark pi jobs are running fine

Re: Unable to run Spark Job in clutser mode

New Contributor

@Gangadhar Kadam Thank you.. Steps you mentioned help me to resolve the problem

,

Thank you @Gangadhar Kadam ..Steps you provided help us to resolve the issue