Created 04-19-2018 08:49 PM
Hi, We are running Spark Thrift Server on HDP .2.6.3.0-235. Sometimes it goes down w/o obvious reason and I would like to find out why.
I see that yarn app is killed by someone (which is fair, someone could kill it.), but the whole service goes down if YARN app goes down? Is it by deesign?
18/04/16 12:37:20 INFO SessionState: Created HDFS directory: /tmp/hive/hive/8fa3df5f-73ef-4c31-9a27-d9e22334579f/_tmp_space.db 18/04/16 12:37:20 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/hive/spark-warehouse 18/04/16 12:37:49 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state KILLED! 18/04/16 12:37:49 INFO HiveServer2: Shutting down HiveServer2 18/04/16 12:37:49 INFO ThriftCLIService: Thrift server has stopped 18/04/16 12:37:49 INFO AbstractService: Service:ThriftBinaryCLIService is stopped. 18/04/16 12:37:49 INFO AbstractService: Service:OperationManager is stopped. 18/04/16 12:37:49 INFO AbstractService: Service:SessionManager is stopped. 18/04/16 12:37:49 INFO AbstractService: Service:CLIService is stopped. 18/04/16 12:37:49 INFO AbstractService: Service:HiveServer2 is stopped. 18/04/16 12:37:49 INFO AbstractConnector: Stopped Spark@47457a81{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 18/04/16 12:37:49 INFO SparkUI: Stopped Spark web UI at http://185.204.3.180:4040 18/04/16 12:37:49 ERROR TransportClient: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 18/04/16 12:37:49 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful java.io.IOException: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 18/04/16 12:37:49 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 18/04/16 12:37:49 ERROR Utils: Uncaught exception in thread Yarn application state monitor org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:551) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:94) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:517) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1670) at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1928) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317) at org.apache.spark.SparkContext.stop(SparkContext.scala:1927) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:108) Caused by: java.io.IOException: Failed to send RPC 5517988194331422796 to /185.204.3.100:50030: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) 18/04/16 12:37:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/04/16 12:37:49 INFO MemoryStore: MemoryStore cleared 18/04/16 12:37:49 INFO BlockManager: BlockManager stopped 18/04/16 12:37:49 INFO BlockManagerMaster: BlockManagerMaster stopped 18/04/16 12:37:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/04/16 12:37:49 INFO SparkContext: Successfully stopped SparkContext
Created 06-21-2019 07:25 AM
@Sergey Sheypak Did you find the above solution ? I am also facing the same issue.
Created 10-29-2019 04:48 AM
do you have any news about a problem? i have the same.