Created on 08-11-2015 02:12 PM - edited 09-16-2022 02:37 AM
When I execute spark-shell , I see bunch of log statements and info logs saying the sc and sqlContext are available.
But, immediately after (about 30+ secs), I get an error saying
scala> 15/08/11 20:40:59 ERROR YarnScheduler: Lost executor 2 on nrdapp3: remote Akka client disassociated and sc is stopped afterwards.
Below is the entire output run in verbose mode. All the machines in the cluster are in the same network with no firewall.
[hdfs@rst-dev-bigdata02 exp_daily]$ spark-shell --verbose Using properties file: /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default property: spark.eventLog.enabled=true Adding default property: spark.shuffle.service.enabled=true Adding default property: spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native Adding default property: spark.yarn.historyServer.address=http://rst-dev-bigdata02:18088 Adding default property: spark.shuffle.service.port=7337 Adding default property: spark.master=yarn-client Adding default property: spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native Adding default property: spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory Adding default property: spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar Adding default property: spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar Parsed arguments: master yarn-client deployMode null executorMemory null executorCores null totalExecutorCores null propertiesFile /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf driverMemory null driverCores null driverExtraClassPath null driverExtraLibraryPath /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native driverExtraJavaOptions null supervise false queue null numExecutors null files null pyFiles null archives null mainClass org.apache.spark.repl.Main primaryResource spark-shell name org.apache.spark.repl.Main childArgs [] jars null packages null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf: spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native spark.yarn.jar -> local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar spark.eventLog.enabled -> true spark.yarn.historyServer.address -> http://rst-dev-bigdata02:18088 spark.serializer -> org.apache.spark.serializer.KryoSerializer spark.shuffle.service.enabled -> true spark.shuffle.service.port -> 7337 spark.eventLog.dir -> hdfs://nameservice1/user/spark/applicationHistory spark.executor.extraClassPath -> /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar spark.master -> yarn-client Main class: org.apache.spark.repl.Main Arguments: System properties: spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native spark.yarn.jar -> local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native spark.yarn.historyServer.address -> http://rst-dev-bigdata02:18088 spark.eventLog.enabled -> true SPARK_SUBMIT -> true spark.serializer -> org.apache.spark.serializer.KryoSerializer spark.app.name -> org.apache.spark.repl.Main spark.shuffle.service.enabled -> true spark.jars -> spark.shuffle.service.port -> 7337 spark.executor.extraClassPath -> /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar spark.eventLog.dir -> hdfs://nameservice1/user/spark/applicationHistory spark.master -> yarn-client Classpath elements: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/avro-tools-1.7.6-cdh5.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/08/11 20:39:40 INFO SecurityManager: Changing view acls to: hdfs 15/08/11 20:39:40 INFO SecurityManager: Changing modify acls to: hdfs 15/08/11 20:39:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 15/08/11 20:39:40 INFO HttpServer: Starting HTTP Server 15/08/11 20:39:40 INFO Server: jetty-8.y.z-SNAPSHOT 15/08/11 20:39:41 INFO AbstractConnector: Started SocketConnector@0.0.0.0:41624 15/08/11 20:39:41 INFO Utils: Successfully started service 'HTTP class server' on port 41624. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.3.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) Type in expressions to have them evaluated. Type :help for more information. 15/08/11 20:39:48 INFO SparkContext: Running Spark version 1.3.0 15/08/11 20:39:48 INFO SecurityManager: Changing view acls to: hdfs 15/08/11 20:39:48 INFO SecurityManager: Changing modify acls to: hdfs 15/08/11 20:39:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 15/08/11 20:39:48 INFO Slf4jLogger: Slf4jLogger started 15/08/11 20:39:48 INFO Remoting: Starting remoting 15/08/11 20:39:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@rst-dev-bigdata02:37628] 15/08/11 20:39:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@rst-dev-bigdata02:37628] 15/08/11 20:39:48 INFO Utils: Successfully started service 'sparkDriver' on port 37628. 15/08/11 20:39:48 INFO SparkEnv: Registering MapOutputTracker 15/08/11 20:39:50 INFO SparkEnv: Registering BlockManagerMaster 15/08/11 20:39:50 INFO DiskBlockManager: Created local directory at /tmp/spark-3bc46fb6-668e-4f79-9e12-06df1359e017/blockmgr-d30a3be5-2790-4bd8-bc7a-ea9a41b717ee 15/08/11 20:39:50 INFO MemoryStore: MemoryStore started with capacity 246.0 MB 15/08/11 20:39:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-73b9ac43-31fc-4e62-ac90-3d930a3b636b/httpd-d77d01d7-66bd-4384-8406-2bfceb49094c 15/08/11 20:39:50 INFO HttpServer: Starting HTTP Server 15/08/11 20:39:50 INFO Server: jetty-8.y.z-SNAPSHOT 15/08/11 20:39:50 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42431 15/08/11 20:39:50 INFO Utils: Successfully started service 'HTTP file server' on port 42431. 15/08/11 20:39:50 INFO SparkEnv: Registering OutputCommitCoordinator 15/08/11 20:39:50 INFO Server: jetty-8.y.z-SNAPSHOT 15/08/11 20:39:50 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/08/11 20:39:50 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/08/11 20:39:50 INFO SparkUI: Started SparkUI at http://rst-dev-bigdata02:4040 15/08/11 20:39:51 INFO Client: Requesting a new application from cluster with 6 NodeManagers 15/08/11 20:39:51 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6506 MB per container) 15/08/11 20:39:51 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/08/11 20:39:51 INFO Client: Setting up container launch context for our AM 15/08/11 20:39:51 INFO Client: Preparing resources for our AM container 15/08/11 20:39:53 INFO Client: Setting up the launch environment for our AM container 15/08/11 20:39:53 INFO SecurityManager: Changing view acls to: hdfs 15/08/11 20:39:53 INFO SecurityManager: Changing modify acls to: hdfs 15/08/11 20:39:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 15/08/11 20:39:53 INFO Client: Submitting application 64 to ResourceManager 15/08/11 20:39:54 INFO YarnClientImpl: Submitted application application_1439215700542_0064 15/08/11 20:39:55 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED) 15/08/11 20:39:55 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.hdfs start time: 1439325593917 final status: UNDEFINED tracking URL: http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064/ user: hdfs 15/08/11 20:39:56 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED) 15/08/11 20:39:57 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED) 15/08/11 20:39:57 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@rst-dev-bigdata01:41025/user/YarnAM#-652030164] 15/08/11 20:39:57 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rst-dev-bigdata01,rst-dev-bigdata02, PROXY_URI_BASES -> http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064,http://rst-dev-bigdata02:8088/proxy/application_1439215700542_0064), /proxy/application_1439215700542_0064 15/08/11 20:39:57 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/08/11 20:39:58 INFO Client: Application report for application_1439215700542_0064 (state: RUNNING) 15/08/11 20:39:58 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: rst-dev-bigdata01 ApplicationMaster RPC port: 0 queue: root.hdfs start time: 1439325593917 final status: UNDEFINED tracking URL: http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064/ user: hdfs 15/08/11 20:39:58 INFO YarnClientSchedulerBackend: Application application_1439215700542_0064 has started running. 15/08/11 20:39:58 INFO NettyBlockTransferService: Server created on 39119 15/08/11 20:39:58 INFO BlockManagerMaster: Trying to register BlockManager 15/08/11 20:39:58 INFO BlockManagerMasterActor: Registering block manager rst-dev-bigdata02:39119 with 246.0 MB RAM, BlockManagerId(<driver>, rst-dev-bigdata02, 39119) 15/08/11 20:39:58 INFO BlockManagerMaster: Registered BlockManager 15/08/11 20:39:59 INFO EventLoggingListener: Logging events to hdfs://nameservice1/user/spark/applicationHistory/application_1439215700542_0064 15/08/11 20:40:03 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@nrdapp3:56554/user/Executor#346078266] with ID 2 15/08/11 20:40:03 INFO BlockManagerMasterActor: Registering block manager nrdapp3:52619 with 530.3 MB RAM, BlockManagerId(2, nrdapp3, 52619) 15/08/11 20:40:20 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 15/08/11 20:40:20 INFO SparkILoop: Created spark context.. Spark context available as sc. 15/08/11 20:40:21 INFO SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> 15/08/11 20:40:59 ERROR YarnScheduler: Lost executor 2 on nrdapp3: remote Akka client disassociated 15/08/11 20:40:59 INFO DAGScheduler: Executor lost: 2 (epoch 0) 15/08/11 20:40:59 INFO BlockManagerMasterActor: Trying to remove executor 2 from BlockManagerMaster. 15/08/11 20:40:59 INFO BlockManagerMasterActor: Removing block manager BlockManagerId(2, nrdapp3, 52619) 15/08/11 20:40:59 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor 15/08/11 20:41:02 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@rst-dev-bigdata01:58701/user/YarnAM#-1423911535] 15/08/11 20:41:02 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rst-dev-bigdata01,rst-dev-bigdata02, PROXY_URI_BASES -> http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064,http://rst-dev-bigdata02:8088/proxy/application_1439215700542_0064), /proxy/application_1439215700542_0064 15/08/11 20:41:02 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/08/11 20:41:05 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@nrdapp3:35986/user/Executor#-738360211] with ID 2 15/08/11 20:41:05 INFO DAGScheduler: Host added was in lost list earlier: nrdapp3 15/08/11 20:41:05 INFO BlockManagerMasterActor: Registering block manager nrdapp3:56783 with 530.3 MB RAM, BlockManagerId(2, nrdapp3, 56783) 15/08/11 20:42:03 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 15/08/11 20:42:03 INFO SparkUI: Stopped Spark web UI at http://rst-dev-bigdata02:4040 15/08/11 20:42:03 INFO DAGScheduler: Stopping DAGScheduler 15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Shutting down all executors 15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Asking each executor to shut down 15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Stopped 15/08/11 20:42:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! 15/08/11 20:42:03 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 15/08/11 20:42:03 INFO MemoryStore: MemoryStore cleared 15/08/11 20:42:03 INFO BlockManager: BlockManager stopped 15/08/11 20:42:03 INFO BlockManagerMaster: BlockManagerMaster stopped 15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/08/11 20:42:03 INFO Remoting: Remoting shut down 15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 15/08/11 20:42:03 INFO SparkContext: Successfully stopped SparkContext SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/avro-tools-1.7.6-cdh5.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Any ideas on what is going on wrong here?
Thanks,
Created 08-11-2015 11:24 PM
Check the memory and container settings.
You rely on the defaults when you pass nothing in and this could have caused the containers to be killed.
The yarn logs should tell you more, the application id is printed in the logs you added so retrieve them by running:
yarn logs -applicationId application_1439215700542_0064
analyse them and check what happened
Wilfred
Created 08-11-2015 11:24 PM
Check the memory and container settings.
You rely on the defaults when you pass nothing in and this could have caused the containers to be killed.
The yarn logs should tell you more, the application id is printed in the logs you added so retrieve them by running:
yarn logs -applicationId application_1439215700542_0064
analyse them and check what happened
Wilfred
Created 08-12-2015 08:50 AM