Created 03-30-2017 01:28 PM
I have a functioning HDP cluster that seems to be working perfectly except for Hive and Oozie, which will not start. The problems with these two cropped up at install time, but I (chose to) ignore them at the time as I did not need / want to use them immediately (all I needed at first were basic Hadoop things like HDFS, Sqoop and Flume, Map-Reduce, and so forth) so I installed these and left the problems for later.
I am now revisiting these ‘components’ as I need to find (and fix) whatever is going wrong. I have started with Hive, and worked my way up the problem chain, starting with the Hive Metastore failing to start and working backward. To cut a long story very short, the step in the chain that is failing is the schema-tool initialization of the Hive meta-store database schema, and here the underlying (root) cause seems to be a jdbc problem – it fails to load the drivers for some reason (relevant pats of the error message are):
/usr/hdp/current/hive-client/bin/schematool -initSchema -dbType mysql Metastore connection URL:jdbc:mysql://hdata2.edi.local/hive Metastore Connection Driver :com.mysql.jdbc.Driver Metastore connection User:hive org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver Underlying cause: java.lang.ClassNotFoundException : com.mysql.jdbc.Driver *** schemaTool failed ***
Other programs - like sqoop for example - can connect to this MySQL database over jdbc, so I cannot figure out why the Hive schema-tool cannot. The errors messages suggest that it does not find the jdbc jar file, but the right symlink is in /usr/share/java and everybody else seems to find it without problem. I have v5.1.28 of the Oracle mysql-connector-java jar file which is the current version, so it’s not a problem with back-level files either.
I have MySQL installed on the ‘master’ node of the cluster; Ambari is happily using it, I can drive it from the CLI, can read and write data to it with flume, etc. So I know that it is installed correctly and working, that I can connect to it from all the nodes in my cluster, and that it also has a (empty) database for Hive to use, has users created and permission grants in place, etc.
So, I cannot figure out what the underlying problem is, and I could use suggestions as to possible causes/solutions. Does Hive (well, the Hive schema tool) need to have jdbc somewhere special?
Regards, and thanks in advance to all,
Rick
Created 03-30-2017 10:29 PM
Hive requires the mysql jdbc jar under HIVE_HOME/lib (/usr/hdp/current/hive-client/lib/). This generally should be done on all hosts that have hive installed (at the least on the hive host and host where you are running the schema tool from).
Created 03-31-2017 10:29 AM
Ta Deepesh - that fixed it. At least I can now get one step further... at least schematool now loads the jdbc drivers before abending [on a 'connection refused' exception even though I can log into the MySQL db from the command line... sigh]. Thanks a million... By the way, I put a symlink into HIVE_HOME/lib for the file (as opposed to physically copying it) - having multiple physical copies of something like a jdbc driver JAR creates too many opportunities for versions to get out of synch for my liking.