Support Questions

Find answers, ask questions, and share your expertise

Oozie Spark action cant interact with Hive

avatar
New Contributor

Hi All,

 

We have many Oozie workflows in Hue which has spark actions that interacts with Hive. We have added hive-site.xml to the workflows and everything worked fine with Cloudera 5.7.1. We have just updated to Cloudera 5.10 with the newest parcels and Oozie Spark actions can’t reach Hive warehouse anymore. We tried to add hive-site.xml to the workflows, set --files hdfs://<path to hive-site.xml> at the "Options list" and set hive.metastore.uris at the properties but nothing worked. If we start these spark apps with spark-submit or with spark shell it works fine. We also tried to reach Hive warehouse from Oozie Spark action at another total different cluster (with CDH 5.10) but this bug exists there too.

We are using a Postgres database for Hive metastore.

Can anybody create a working Oozie Spark action that reach Hive with  CDH 5.7 < ?

This issue comes up many times in the last few months here in Cloudera’s forum but there is no solution so any help will be very appreciated! Thanks

[main] WARN  org.apache.hadoop.hive.metastore.HiveMetaStore  - Retrying creating default database after error: Error creating transactional connection factory
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
	at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:781)
....
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:237)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
	... 101 more
Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:58)
	at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:61)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217)
	... 103 more

 

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi,

 

The --files tag is broken in 5.10.0 from Oozie because of OOZIE-2547. It was fixed by OOZIE-2806 and  OOZIE-2802 which will be available in 5.10.1. Until that, a workaround is to put a copy of hive-site.xml into the Oozie Spark sharelib and add the hive sharelib to the spark action by defining oozie.action.sharelib.for.spark=spark,hive in the job.properties.

The Spark action work fine with --files tag in 5.9.x versions.

The actual sharelib can be located by the oozie admin -shareliblist spark command.

After putting the hive-site.xml to the folder the oozie admin -sharelibupdate command should be executed.

I hope this helps

gp

 

View solution in original post

4 REPLIES 4

avatar
New Contributor

Hi Team,

 

I am facing the exact same problem while connecting to hive using a spark action with oozie. the spark program works perfectly when it is executed from edge node  with client mode. 

 

 

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:237)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
	... 74 more
Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
	at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:58)
	at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:61)
	at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217)
	... 76 more

avatar
Rising Star

Hi,

 

The --files tag is broken in 5.10.0 from Oozie because of OOZIE-2547. It was fixed by OOZIE-2806 and  OOZIE-2802 which will be available in 5.10.1. Until that, a workaround is to put a copy of hive-site.xml into the Oozie Spark sharelib and add the hive sharelib to the spark action by defining oozie.action.sharelib.for.spark=spark,hive in the job.properties.

The Spark action work fine with --files tag in 5.9.x versions.

The actual sharelib can be located by the oozie admin -shareliblist spark command.

After putting the hive-site.xml to the folder the oozie admin -sharelibupdate command should be executed.

I hope this helps

gp

 

avatar
New Contributor

Thanks Gezapeti for the solution and the explanation, it's working 🙂

avatar
Rising Star

Just don't forget that the hive-site.xml has to updated every time a hive config was changed. 

After updating to a newer version, the workflow has to be updated to use the --files again.