I've just migrated an existing Hive cluster over to HDP 2.5. Although it says that it's using Hive 1.2.1000, it's also installed 2.1 (tech preview) and seems to be trying to load 2.1 binaries and schema expectations. I do NOT have the LLAP interactive query feature enabled. Because my schema is 1.2, the Hive service startup is failing. On my hive master node, it's trying to run a command like:
/var/lib/ambari-agent/ambari-sudo.sh su hive -l -s /bin/bash -c export PATH=/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent > /dev/null ; export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-server2-hive2/bin/schematool -info -dbType mysql -userName hive -passWord [redacted]
With relevant output:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/126.96.36.199-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/188.8.131.52-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://[redacted]/metastore Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Hive distribution version: 2.1.0 Metastore schema version: 1.2.1000 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.1.0, Database Schema Version: 1.2.1000 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.1.0, Database Schema Version: 1.2.1000 at org.apache.hive.beeline.HiveSchemaTool.assertCompatibleVersion(HiveSchemaTool.java:215)
It also tries to run
/usr/hdp/current/hive-server2-hive2/bin/schematool -initSchema -dbType mysql
If I upgrade the schema to 2.1, it all works - but this isn't an option for us, given our use of Spark's HiveContext/Spark on Hive. Does HDP 2.5 require a 2.1x schema, even though it has Hive 1.2x on the tin? Is there a way to bypass this? Am I just being completely dense somewhere?
> but this isn't an option for us, given our use of Spark's HiveContext/Spark on Hive.
Can you explain that further?
Spark's support for the Hive metastore caps out at v1.2.1 and it doesn't seem that HDP has worked around this. The code linked above checks the live Hive metastore schema and complains about a mismatch. HDP 2.5 added 'tech preview' support for Hive 2.x, but it seems that they sought to avoid having to have two separate metastores by forcing Hive 1.2 to use a shared, upgraded 2.1.0 metastore schema. Unfortunately, this seems to break "Spark on Hive" (attempting to use a HiveContext in Spark 1.6).
/usr/hdp/current/spark-client/bin/pyspark --master yarn --deploy-mode client
import pyspark sqlContext = HiveContext(sc) sqlContext.sql("use default")
Results in the error:
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : scala.MatchError: 2.1.0 (of class java.lang.String) at org.apache.spark.sql.hive.client.IsolatedClientLoader$.hiveVersion(IsolatedClientLoader.scala:86) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255) at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459) at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)
Although we're trying to consolidate to (as much as possible) a pure SQL-like data warehouse, a lot of our machine learning work uses Spark Dataframes and this is a breaking change for us. Looking at the actual schema differences between 1.2.1 and 2.1.0, however, they seem pretty innocent and unlikely to affect our use of Spark Dataframes with Hive 1.2.1 - I'm currently trying to rebuild HDP's Spark package from source, to see if changing the magic numbers there will work. This would certainly be, at best, a bit of a hack and far from perfect.
The thrift API should be safe from these problems.
LLAP is disabled and the interactive hiveserver2 service is not present. This error existed from the first start up of a fresh Ambari/HDP install (which doesn't enable LLAP, by default), where I re-used an existing MySql Hive 1.2.1 metastore. Ambari's Hive service would not start until I upgraded the schema. This thread seems to imply this was a conscious choice?