Support Questions
Find answers, ask questions, and share your expertise

HowTo diagnose and fix Spark Thrift Server failure on HDP .2.6.3.0-235

Hi, We are running Spark Thrift Server on HDP .2.6.3.0-235. Sometimes it goes down w/o obvious reason and I would like to find out why.

I see that yarn app is killed by someone (which is fair, someone could kill it.), but the whole service goes down if YARN app goes down? Is it by deesign?

18/04/16 12:37:20 INFO SessionState: Created HDFS directory: /tmp/hive/hive/8fa3df5f-73ef-4c31-9a27-d9e22334579f/_tmp_space.db

18/04/16 12:37:20 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/hive/spark-warehouse

18/04/16 12:37:49 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state KILLED!

18/04/16 12:37:49 INFO HiveServer2: Shutting down HiveServer2

18/04/16 12:37:49 INFO ThriftCLIService: Thrift server has stopped

18/04/16 12:37:49 INFO AbstractService: Service:ThriftBinaryCLIService is stopped.

18/04/16 12:37:49 INFO AbstractService: Service:OperationManager is stopped.

18/04/16 12:37:49 INFO AbstractService: Service:SessionManager is stopped.

18/04/16 12:37:49 INFO AbstractService: Service:CLIService is stopped.

18/04/16 12:37:49 INFO AbstractService: Service:HiveServer2 is stopped.

18/04/16 12:37:49 INFO AbstractConnector: Stopped Spark@47457a81{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

18/04/16 12:37:49 INFO SparkUI: Stopped Spark web UI at http://185.204.3.180:4040

18/04/16 12:37:49 ERROR TransportClient: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException

java.nio.channels.ClosedChannelException

	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

18/04/16 12:37:49 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful

java.io.IOException: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException

	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)

	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)

	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)

	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)

	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)

	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)

	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)

	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

	at java.lang.Thread.run(Thread.java:748)

Caused by: java.nio.channels.ClosedChannelException

	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

18/04/16 12:37:49 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices

(serviceOption=None,

 services=List(),

 started=false)

18/04/16 12:37:49 ERROR Utils: Uncaught exception in thread Yarn application state monitor

org.apache.spark.SparkException: Exception thrown in awaitResult: 

	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)

	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)

	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:551)

	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:94)

	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151)

	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:517)

	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1670)

	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1928)

	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317)

	at org.apache.spark.SparkContext.stop(SparkContext.scala:1927)

	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:108)

Caused by: java.io.IOException: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException

	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)

	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)

	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)

	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)

	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)

	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)

	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)

	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

	at java.lang.Thread.run(Thread.java:748)

Caused by: java.nio.channels.ClosedChannelException

	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

18/04/16 12:37:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

18/04/16 12:37:49 INFO MemoryStore: MemoryStore cleared

18/04/16 12:37:49 INFO BlockManager: BlockManager stopped

18/04/16 12:37:49 INFO BlockManagerMaster: BlockManagerMaster stopped

18/04/16 12:37:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

18/04/16 12:37:49 INFO SparkContext: Successfully stopped SparkContext

2 REPLIES 2

Re: HowTo diagnose and fix Spark Thrift Server failure on HDP .2.6.3.0-235

@Sergey Sheypak Did you find the above solution ? I am also facing the same issue.

Re: HowTo diagnose and fix Spark Thrift Server failure on HDP .2.6.3.0-235

Expert Contributor

do you have any news about a problem? i have the same.