Support Questions

Find answers, ask questions, and share your expertise

Timed out waiting for the completion of SASL negotiation between HiveServer2 and the Remote Spark Driver.

avatar

I'm studying CDH 6.3.0 with hive and spark and I'm facing for a problem that held me for a week.
I already installed it from scratch and nothing solved.


The timeout occurs when I try to select from a table.

Considering this :

 

 

 

DROP TABLE dashboard.top10;
CREATE TABLE dashboard.top10 (id VARCHAR(100), floatVal DOUBLE)
STORED AS ORC tblproperties("compress.mode"="SNAPPY");
INSERT into dashboard.top10 SELECT * from analysis.total_raw order by floatVal DESC limit 10;

 

 

 

 

 Error while processing statement: FAILED: Execution Error, return code
 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to
 create Spark client for Spark session
 faf8afcb-0e43-4097-8dcb-44f3f1445005_0:
 java.util.concurrent.TimeoutException: Client
 'faf8afcb-0e43-4097-8dcb-44f3f1445005_0' timed out waiting for
 connection from the Remote Spark Driver

 


The container is exiting and here is the full log:

Spoiler
exception: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting to connect to HiveServer2.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:155)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:559)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting to connect to HiveServer2.
at org.apache.hive.spark.client.rpc.Rpc$2.run(Rpc.java:120)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
)
19/08/26 17:15:11 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:447)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:275)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:805)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:804)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:804)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting to connect to HiveServer2.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:41)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:155)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:559)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting to connect to HiveServer2.
at org.apache.hive.spark.client.rpc.Rpc$2.run(Rpc.java:120)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
19/08/26 17:15:11 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://masternode.vm:8020/user/root/.sparkStaging/application_1566847834444_0003
19/08/26 17:15:16 INFO util.ShutdownHookManager: Shutdown hook called

 

 

4 REPLIES 4

avatar

My guess is that timeout settings are not being taken into account.
And as my test environment, I can have a latency greater than 1s

I found some warnings that explain my guess:

 

2019-08-27T10:52:10,045  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
2019-08-27T10:52:10,046  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
2019-08-27T10:52:10,046  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.future.timeout=60000
2019-08-27T10:52:10,046  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
2019-08-27T10:52:10,046  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
2019-08-27T10:52:10,053  INFO [spark-submit-stderr-redir-05681b44-ae8a-42d9-a80d-20dad05faa98 main] client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800

avatar
Super Guru
@LeonanCarvalho,

The error looks like that AM was not able to reach back to HS2 for some reason. Have you tried to ping HS2 from the AM host for this particular job?

If you re-run, when AM on another NM, do you get the same error all the time?

Cheers
Eric

avatar

Thanks for your reply @EricL ,  

 

The connection between nodes are fine, I edited hive-site.xml with these parameters and its working now, but I'm not sure why the timeout was happening

 

set hive.spark.client.connect.timeout 360000ms
set hive.spark.client.server.connect.timeout 360000ms

 

 

BR

 

avatar
Super Guru
@LeonanCarvalho

In order to find out why it takes time, you need to look at the AM log, see at what stage it was hung and maybe try to get the jstacks of both AM and HS2 to see which thread might be blocking and cause the hang. 6 minutes timeout is a bit long. Is your data quite big?

Cheers
Eric