Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark streaming job with akka error on yarn-cluster mode

Spark streaming job with akka error on yarn-cluster mode

New Contributor

Hi all:

 

My cloudera version is CDH 5.5.1. I write a spark streaming job which read kafka topic and write record into HBase, but I have a problem while running spark streaming job.

 

I run the job by the following command:

spark-submit --class KafkaToHBase --master yarn-cluster --driver-class-path /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar KafkaMaven-0.0.2.jar

 

When the job accepted and running on the yarn application, sometimes it have akka association error.

Below is my application log:

 

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/avro-tools-1.7.6-cdh5.5.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/01/19 10:51:11 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/01/19 10:51:12 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1450234627340_0050_000001
16/01/19 10:51:13 INFO spark.SecurityManager: Changing view acls to: yarn,scepter
16/01/19 10:51:13 INFO spark.SecurityManager: Changing modify acls to: yarn,scepter
16/01/19 10:51:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, scepter); users with modify permissions: Set(yarn, scepter)
16/01/19 10:51:13 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
16/01/19 10:51:13 INFO yarn.ApplicationMaster: Waiting for spark context initialization

[Stage 0:>                                                         (0 + 0) / 50]16/01/19 10:51:21 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] <- [akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:46612]: Error [Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:46612] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:46612
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
akka.event.Logging$Error$NoCause$
16/01/19 10:51:21 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] <- [akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:41451]: Error [Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:41451] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:41451
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
akka.event.Logging$Error$NoCause$
16/01/19 10:51:21 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] <- [akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:47636]: Error [Shut down address: akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:47636] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:47636
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
akka.event.Logging$Error$NoCause$

[Stage 0:>                                                         (0 + 3) / 50]
[Stage 0:===>                                                      (3 + 4) / 50]
[Stage 0:===========>                                             (10 + 4) / 50]
[Stage 0:======================>                                  (20 + 4) / 50]
[Stage 0:====================================>                    (32 + 3) / 50]
[Stage 0:====================================================>    (46 + 3) / 50]
                                                                                
16/01/19 10:52:25 ERROR YarnClusterScheduler: Lost executor 2 on cloudera-73.cloudera.com: remote Rpc client disassociated
16/01/19 10:52:25 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-73.cloudera.com/140.92.61.73:43521
]
akka.event.Logging$Error$NoCause$
16/01/19 10:52:25 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-73.cloudera.com/140.92.61.73:43521
]
akka.event.Logging$Error$NoCause$
16/01/19 10:52:25 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-73.cloudera.com:43521]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-73.cloudera.com/140.92.61.73:43521
]
akka.event.Logging$Error$NoCause$
16/01/19 10:52:28 ERROR YarnClusterScheduler: Lost executor 3 on cloudera-72.cloudera.com: remote Rpc client disassociated
16/01/19 10:52:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-72.cloudera.com/140.92.61.72:44451
]
akka.event.Logging$Error$NoCause$
16/01/19 10:52:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-72.cloudera.com/140.92.61.72:44451
]
akka.event.Logging$Error$NoCause$
16/01/19 10:52:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] -> [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]: Error [Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@cloudera-72.cloudera.com:44451]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: cloudera-72.cloudera.com/140.92.61.72:44451
]
akka.event.Logging$Error$NoCause$

[Stage 3:>                                                          (0 + 0) / 5]16/01/19 10:53:46 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] <- [akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:59671]: Error [Shut down address: akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:59671] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher@cloudera-73.cloudera.com:59671
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
akka.event.Logging$Error$NoCause$
16/01/19 10:53:47 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@140.92.61.73:51008] <- [akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:49574]: Error [Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:49574] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher@cloudera-72.cloudera.com:49574
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
akka.event.Logging$Error$NoCause$

[Stage 3:>                                                          (0 + 1) / 5]
[Stage 3:>                                                          (0 + 2) / 5]
[Stage 3:===========>                                               (1 + 1) / 5]
[Stage 3:===================================>                       (3 + 2) / 5]
                                                                                

[Stage 4:>                                                          (0 + 1) / 5]
[Stage 4:===========>                                               (1 + 1) / 5]
[Stage 4:=======================>                                   (2 + 1) / 5]
[Stage 4:===================================>                       (3 + 1) / 5]
[Stage 4:===============================================>           (4 + 1) / 5]
                                                                                

 

Although the streaming job can still run, but yarn lost its executor which causes some record lost in HBase.

 

Can someone help me this problem? Or should I modify some configuration file on the spark or yarn config?

1 REPLY 1
Highlighted

Re: Spark streaming job with akka error on yarn-cluster mode

Rising Star

Maybe it has something to do with the approach/configuration explained at the link here below? Just an idea...

 

http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_yarn_long_jobs.html