Created on 03-13-2017 06:52 AM - edited 09-16-2022 04:14 AM
Hi All,
We have many Oozie workflows in Hue which has spark actions that interacts with Hive. We have added hive-site.xml to the workflows and everything worked fine with Cloudera 5.7.1. We have just updated to Cloudera 5.10 with the newest parcels and Oozie Spark actions can’t reach Hive warehouse anymore. We tried to add hive-site.xml to the workflows, set --files hdfs://<path to hive-site.xml> at the "Options list" and set hive.metastore.uris at the properties but nothing worked. If we start these spark apps with spark-submit or with spark shell it works fine. We also tried to reach Hive warehouse from Oozie Spark action at another total different cluster (with CDH 5.10) but this bug exists there too.
We are using a Postgres database for Hive metastore.
Can anybody create a working Oozie Spark action that reach Hive with CDH 5.7 < ?
This issue comes up many times in the last few months here in Cloudera’s forum but there is no solution so any help will be very appreciated! Thanks
[main] WARN org.apache.hadoop.hive.metastore.HiveMetaStore - Retrying creating default database after error: Error creating transactional connection factory javax.jdo.JDOFatalInternalException: Error creating transactional connection factory at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:781)
....
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:237) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82) ... 101 more Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:58) at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:61) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217) ... 103 more
Created on 03-21-2017 02:20 AM - edited 03-21-2017 02:22 AM
Hi,
The --files tag is broken in 5.10.0 from Oozie because of OOZIE-2547. It was fixed by OOZIE-2806 and OOZIE-2802 which will be available in 5.10.1. Until that, a workaround is to put a copy of hive-site.xml into the Oozie Spark sharelib and add the hive sharelib to the spark action by defining oozie.action.sharelib.for.spark=spark,hive in the job.properties.
The Spark action work fine with --files tag in 5.9.x versions.
The actual sharelib can be located by the oozie admin -shareliblist spark command.
After putting the hive-site.xml to the folder the oozie admin -sharelibupdate command should be executed.
I hope this helps
gp
Created 03-14-2017 06:55 AM
Hi Team,
I am facing the exact same problem while connecting to hive using a spark action with oozie. the spark program works perfectly when it is executed from edge node with client mode.
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:237) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82) ... 74 more Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:58) at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:61) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217) ... 76 more
Created on 03-21-2017 02:20 AM - edited 03-21-2017 02:22 AM
Hi,
The --files tag is broken in 5.10.0 from Oozie because of OOZIE-2547. It was fixed by OOZIE-2806 and OOZIE-2802 which will be available in 5.10.1. Until that, a workaround is to put a copy of hive-site.xml into the Oozie Spark sharelib and add the hive sharelib to the spark action by defining oozie.action.sharelib.for.spark=spark,hive in the job.properties.
The Spark action work fine with --files tag in 5.9.x versions.
The actual sharelib can be located by the oozie admin -shareliblist spark command.
After putting the hive-site.xml to the folder the oozie admin -sharelibupdate command should be executed.
I hope this helps
gp
Created 03-21-2017 07:30 AM
Thanks Gezapeti for the solution and the explanation, it's working 🙂
Created 03-21-2017 08:25 AM
Just don't forget that the hive-site.xml has to updated every time a hive config was changed.
After updating to a newer version, the workflow has to be updated to use the --files again.