Support Questions

Find answers, ask questions, and share your expertise

HDP3.1.4 - Error submitting spark jobs using Hive Warehouse Connector - KeeperErrorCode = ConnectionLoss

New Contributor

Hi everyone,

 

I have installed and configured a HDP 3.1.4 cluster (unsecured - without kerberos). All services are up and running, tested and working except for Spark with Hive Warehouse Connector (HWC).

 

I have followed the configuration instructions to use HWC to enable Spark to interact with Hive. I have successfully tested in Zeppelin with pyspark interpreter and it works fine, I can read Hive tables and show results.

 

Although I have stumbled into an error which I can't seem to figure out what is causing it. Whenever I submit an application using spark-submit I get the following error:

 

 

 

20/01/02 11:37:24 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, host1.01.com, executor 1): java.lang.RuntimeException: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:66)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:619)
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:422)
at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:63)
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:181)
at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:177)
at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstanceForHost(LlapBaseInputFormat.java:415)
at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:397)
at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:160)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.getRecordReader(HiveWarehouseDataReader.java:72)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.<init>(HiveWarehouseDataReader.java:50)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.getDataReader(HiveWarehouseDataReaderFactory.java:72)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:64)
... 18 more
Caused by: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at shadecurator.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
at shadecurator.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
at shadecurator.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489)
at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
at shadecurator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
at shadecurator.org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
at shadecurator.org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:326)
at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:303)
at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:597)
... 29 more

 

 

 

The command I use the following (for cluster mode):

 

 

/usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar --py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.4.0-315.zip spark_compare_test.py

 

 

 

The code in spark_compare_test.py is:

 

 

from pyspark.sql import SparkSession
from pyspark_llap import HiveWarehouseSession

# Create spark session - NOT NEEDED already avalable on interpreter
spark = SparkSession.builder.appName("HWC Test - CLI").getOrCreate()

# Create HWC session
hive = HiveWarehouseSession.session(spark).userPassword('hive','hive').build()

# Execute a query to read from Spark using HWC
hive.executeQuery("select * from db_one.table_v4 where day_partition='2019-12-02'").show(20)

# END

 

 

 

This happens no matter what type of query to a Hive table I have doing, even a single SELECT statement will not execute, always ending up on this error. Tried yarn client and cluster modes and the same happens.

 

Here is a screenshot of the job in Spark history server UI

spark_history_20200102_2.png

 

Already tried the following:

  • Increase TikTime of the zookeeper to 10s, 120s and 600s and the behavior was the same
  • Tried submitting the job on a client installed on the zookeeper machine
  • Checked for zookeeper node health and all are always up with no errors on the logs
  • I haven't found any evidence of rate limiting on the zookeeper logs

 

I am not able to explain this behavior, has someone experienced similar behavior and found a solution solve this problem? Can someone guide me to the next step in the troubleshooting process?

 

Thank you and regards.

2 REPLIES 2

Explorer

Hi, @fvcamelo 

 

I face the same issue, did you find a clue or a solution ?

 

Thanks in advance.

Best Regards

Explorer

For me,

 

I solved all my issue with check i was connecting to the right server (wich it was not the case) and the use of :

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

instead of 

val hive = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build()

 and add this jars at the end of my spark-submit :

--jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar

 

hope it will help.

 

Best Regards.