Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

New Contributor

Hello,

I've just migrated an existing Hive cluster over to HDP 2.5. Although it says that it's using Hive 1.2.1000, it's also installed 2.1 (tech preview) and seems to be trying to load 2.1 binaries and schema expectations. I do NOT have the LLAP interactive query feature enabled. Because my schema is 1.2, the Hive service startup is failing. On my hive master node, it's trying to run a command like:

/var/lib/ambari-agent/ambari-sudo.sh su hive -l -s /bin/bash -c export  PATH=/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent > /dev/null ; export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-server2-hive2/bin/schematool -info -dbType mysql -userName hive -passWord [redacted] 

With relevant output:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hive2/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://[redacted]/metastore
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Hive distribution version:	 2.1.0
Metastore schema version:	 1.2.1000
org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.1.0, Database Schema Version: 1.2.1000
org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.1.0, Database Schema Version: 1.2.1000
	at org.apache.hive.beeline.HiveSchemaTool.assertCompatibleVersion(HiveSchemaTool.java:215)

It also tries to run

/usr/hdp/current/hive-server2-hive2/bin/schematool -initSchema -dbType mysql 

If I upgrade the schema to 2.1, it all works - but this isn't an option for us, given our use of Spark's HiveContext/Spark on Hive. Does HDP 2.5 require a 2.1x schema, even though it has Hive 1.2x on the tin? Is there a way to bypass this? Am I just being completely dense somewhere?

Thanks,

Aaron

5 REPLIES 5
Highlighted

Re: HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

Rising Star

> but this isn't an option for us, given our use of Spark's HiveContext/Spark on Hive.

Can you explain that further?

Re: HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

New Contributor

Spark's support for the Hive metastore caps out at v1.2.1 and it doesn't seem that HDP has worked around this. The code linked above checks the live Hive metastore schema and complains about a mismatch. HDP 2.5 added 'tech preview' support for Hive 2.x, but it seems that they sought to avoid having to have two separate metastores by forcing Hive 1.2 to use a shared, upgraded 2.1.0 metastore schema. Unfortunately, this seems to break "Spark on Hive" (attempting to use a HiveContext in Spark 1.6).

/usr/hdp/current/spark-client/bin/pyspark --master yarn --deploy-mode client
import pyspark
sqlContext = HiveContext(sc)
sqlContext.sql("use default")

Results in the error:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: scala.MatchError: 2.1.0 (of class java.lang.String)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.hiveVersion(IsolatedClientLoader.scala:86)
	at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:258)
	at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255)
	at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459)
	at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272)
	at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271)

Although we're trying to consolidate to (as much as possible) a pure SQL-like data warehouse, a lot of our machine learning work uses Spark Dataframes and this is a breaking change for us. Looking at the actual schema differences between 1.2.1 and 2.1.0, however, they seem pretty innocent and unlikely to affect our use of Spark Dataframes with Hive 1.2.1 - I'm currently trying to rebuild HDP's Spark package from source, to see if changing the magic numbers there will work. This would certainly be, at best, a bit of a hack and far from perfect.

Re: HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

Rising Star
hiveContext.setConf("hive.metastore.uris", "thrift://<metastore>:9083");
The thrift API should be safe from these problems.

Re: HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

Super Guru

Re: HDP 2.5 is trying to force the metastore schema to 2.1.0, which is breaking SparkSQL/Spark on Hive

New Contributor

Hi Timothy,

LLAP is disabled and the interactive hiveserver2 service is not present. This error existed from the first start up of a fresh Ambari/HDP install (which doesn't enable LLAP, by default), where I re-used an existing MySql Hive 1.2.1 metastore. Ambari's Hive service would not start until I upgraded the schema. This thread seems to imply this was a conscious choice?

~Aaron