Member since
05-29-2015
6
Posts
0
Kudos Received
0
Solutions
05-06-2016
02:05 PM
In our CDH 5.5 cluster, we have about 12 region servers nodes out of which 4 nodes are 8 cores and the remaining are 16 cores. We have noticed those 4 nodes are always busy during our load. Is it possible to assign weights to these low powered 4 nodes such that they will house let say 50% less regions compared to other set of nodes.
... View more
Labels:
08-12-2015
08:50 AM
Thanks. This helped. By looking at the logs, I was able to fix the classpath issue on one of the nodes and giving enough executor memory helped the container not to get killed.
... View more
08-11-2015
02:12 PM
When I execute spark-shell , I see bunch of log statements and info logs saying the sc and sqlContext are available. But, immediately after (about 30+ secs), I get an error saying scala> 15/08/11 20:40:59 ERROR YarnScheduler: Lost executor 2 on nrdapp3: remote Akka client disassociated and sc is stopped afterwards. Below is the entire output run in verbose mode. All the machines in the cluster are in the same network with no firewall. [hdfs@rst-dev-bigdata02 exp_daily]$ spark-shell --verbose
Using properties file: /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
Adding default property: spark.yarn.historyServer.address=http://rst-dev-bigdata02:18088
Adding default property: spark.shuffle.service.port=7337
Adding default property: spark.master=yarn-client
Adding default property: spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
Adding default property: spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory
Adding default property: spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar
Adding default property: spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar
Parsed arguments:
master yarn-client
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass org.apache.spark.repl.Main
primaryResource spark-shell
name org.apache.spark.repl.Main
childArgs []
jars null
packages null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file /app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/conf/spark-defaults.conf:
spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
spark.yarn.jar -> local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar
spark.eventLog.enabled -> true
spark.yarn.historyServer.address -> http://rst-dev-bigdata02:18088
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled -> true
spark.shuffle.service.port -> 7337
spark.eventLog.dir -> hdfs://nameservice1/user/spark/applicationHistory
spark.executor.extraClassPath -> /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar
spark.master -> yarn-client
Main class:
org.apache.spark.repl.Main
Arguments:
System properties:
spark.executor.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
spark.yarn.jar -> local:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.4-hadoop2.6.0-cdh5.4.4.jar
spark.driver.extraLibraryPath -> /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/native
spark.yarn.historyServer.address -> http://rst-dev-bigdata02:18088
spark.eventLog.enabled -> true
SPARK_SUBMIT -> true
spark.serializer -> org.apache.spark.serializer.KryoSerializer
spark.app.name -> org.apache.spark.repl.Main
spark.shuffle.service.enabled -> true
spark.jars ->
spark.shuffle.service.port -> 7337
spark.executor.extraClassPath -> /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar:/app/cloudera/parcels/CLABS_PHOENIX/jars/phoenix-core-4.3.0-clabs-phoenix-1.0.0.jar
spark.eventLog.dir -> hdfs://nameservice1/user/spark/applicationHistory
spark.master -> yarn-client
Classpath elements:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/avro-tools-1.7.6-cdh5.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/08/11 20:39:40 INFO SecurityManager: Changing view acls to: hdfs
15/08/11 20:39:40 INFO SecurityManager: Changing modify acls to: hdfs
15/08/11 20:39:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
15/08/11 20:39:40 INFO HttpServer: Starting HTTP Server
15/08/11 20:39:40 INFO Server: jetty-8.y.z-SNAPSHOT
15/08/11 20:39:41 INFO AbstractConnector: Started SocketConnector@0.0.0.0:41624
15/08/11 20:39:41 INFO Utils: Successfully started service 'HTTP class server' on port 41624.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
15/08/11 20:39:48 INFO SparkContext: Running Spark version 1.3.0
15/08/11 20:39:48 INFO SecurityManager: Changing view acls to: hdfs
15/08/11 20:39:48 INFO SecurityManager: Changing modify acls to: hdfs
15/08/11 20:39:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
15/08/11 20:39:48 INFO Slf4jLogger: Slf4jLogger started
15/08/11 20:39:48 INFO Remoting: Starting remoting
15/08/11 20:39:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@rst-dev-bigdata02:37628]
15/08/11 20:39:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@rst-dev-bigdata02:37628]
15/08/11 20:39:48 INFO Utils: Successfully started service 'sparkDriver' on port 37628.
15/08/11 20:39:48 INFO SparkEnv: Registering MapOutputTracker
15/08/11 20:39:50 INFO SparkEnv: Registering BlockManagerMaster
15/08/11 20:39:50 INFO DiskBlockManager: Created local directory at /tmp/spark-3bc46fb6-668e-4f79-9e12-06df1359e017/blockmgr-d30a3be5-2790-4bd8-bc7a-ea9a41b717ee
15/08/11 20:39:50 INFO MemoryStore: MemoryStore started with capacity 246.0 MB
15/08/11 20:39:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-73b9ac43-31fc-4e62-ac90-3d930a3b636b/httpd-d77d01d7-66bd-4384-8406-2bfceb49094c
15/08/11 20:39:50 INFO HttpServer: Starting HTTP Server
15/08/11 20:39:50 INFO Server: jetty-8.y.z-SNAPSHOT
15/08/11 20:39:50 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42431
15/08/11 20:39:50 INFO Utils: Successfully started service 'HTTP file server' on port 42431.
15/08/11 20:39:50 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/11 20:39:50 INFO Server: jetty-8.y.z-SNAPSHOT
15/08/11 20:39:50 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/08/11 20:39:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/08/11 20:39:50 INFO SparkUI: Started SparkUI at http://rst-dev-bigdata02:4040
15/08/11 20:39:51 INFO Client: Requesting a new application from cluster with 6 NodeManagers
15/08/11 20:39:51 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6506 MB per container)
15/08/11 20:39:51 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/11 20:39:51 INFO Client: Setting up container launch context for our AM
15/08/11 20:39:51 INFO Client: Preparing resources for our AM container
15/08/11 20:39:53 INFO Client: Setting up the launch environment for our AM container
15/08/11 20:39:53 INFO SecurityManager: Changing view acls to: hdfs
15/08/11 20:39:53 INFO SecurityManager: Changing modify acls to: hdfs
15/08/11 20:39:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
15/08/11 20:39:53 INFO Client: Submitting application 64 to ResourceManager
15/08/11 20:39:54 INFO YarnClientImpl: Submitted application application_1439215700542_0064
15/08/11 20:39:55 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED)
15/08/11 20:39:55 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1439325593917
final status: UNDEFINED
tracking URL: http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064/
user: hdfs
15/08/11 20:39:56 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED)
15/08/11 20:39:57 INFO Client: Application report for application_1439215700542_0064 (state: ACCEPTED)
15/08/11 20:39:57 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@rst-dev-bigdata01:41025/user/YarnAM#-652030164]
15/08/11 20:39:57 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rst-dev-bigdata01,rst-dev-bigdata02, PROXY_URI_BASES -> http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064,http://rst-dev-bigdata02:8088/proxy/application_1439215700542_0064), /proxy/application_1439215700542_0064
15/08/11 20:39:57 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/08/11 20:39:58 INFO Client: Application report for application_1439215700542_0064 (state: RUNNING)
15/08/11 20:39:58 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: rst-dev-bigdata01
ApplicationMaster RPC port: 0
queue: root.hdfs
start time: 1439325593917
final status: UNDEFINED
tracking URL: http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064/
user: hdfs
15/08/11 20:39:58 INFO YarnClientSchedulerBackend: Application application_1439215700542_0064 has started running.
15/08/11 20:39:58 INFO NettyBlockTransferService: Server created on 39119
15/08/11 20:39:58 INFO BlockManagerMaster: Trying to register BlockManager
15/08/11 20:39:58 INFO BlockManagerMasterActor: Registering block manager rst-dev-bigdata02:39119 with 246.0 MB RAM, BlockManagerId(<driver>, rst-dev-bigdata02, 39119)
15/08/11 20:39:58 INFO BlockManagerMaster: Registered BlockManager
15/08/11 20:39:59 INFO EventLoggingListener: Logging events to hdfs://nameservice1/user/spark/applicationHistory/application_1439215700542_0064
15/08/11 20:40:03 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@nrdapp3:56554/user/Executor#346078266] with ID 2
15/08/11 20:40:03 INFO BlockManagerMasterActor: Registering block manager nrdapp3:52619 with 530.3 MB RAM, BlockManagerId(2, nrdapp3, 52619)
15/08/11 20:40:20 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
15/08/11 20:40:20 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/08/11 20:40:21 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala> 15/08/11 20:40:59 ERROR YarnScheduler: Lost executor 2 on nrdapp3: remote Akka client disassociated
15/08/11 20:40:59 INFO DAGScheduler: Executor lost: 2 (epoch 0)
15/08/11 20:40:59 INFO BlockManagerMasterActor: Trying to remove executor 2 from BlockManagerMaster.
15/08/11 20:40:59 INFO BlockManagerMasterActor: Removing block manager BlockManagerId(2, nrdapp3, 52619)
15/08/11 20:40:59 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
15/08/11 20:41:02 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@rst-dev-bigdata01:58701/user/YarnAM#-1423911535]
15/08/11 20:41:02 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rst-dev-bigdata01,rst-dev-bigdata02, PROXY_URI_BASES -> http://rst-dev-bigdata01:8088/proxy/application_1439215700542_0064,http://rst-dev-bigdata02:8088/proxy/application_1439215700542_0064), /proxy/application_1439215700542_0064
15/08/11 20:41:02 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/08/11 20:41:05 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@nrdapp3:35986/user/Executor#-738360211] with ID 2
15/08/11 20:41:05 INFO DAGScheduler: Host added was in lost list earlier: nrdapp3
15/08/11 20:41:05 INFO BlockManagerMasterActor: Registering block manager nrdapp3:56783 with 530.3 MB RAM, BlockManagerId(2, nrdapp3, 56783)
15/08/11 20:42:03 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/08/11 20:42:03 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/08/11 20:42:03 INFO SparkUI: Stopped Spark web UI at http://rst-dev-bigdata02:4040
15/08/11 20:42:03 INFO DAGScheduler: Stopping DAGScheduler
15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Shutting down all executors
15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Asking each executor to shut down
15/08/11 20:42:03 INFO YarnClientSchedulerBackend: Stopped
15/08/11 20:42:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
15/08/11 20:42:03 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/08/11 20:42:03 INFO MemoryStore: MemoryStore cleared
15/08/11 20:42:03 INFO BlockManager: BlockManager stopped
15/08/11 20:42:03 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/08/11 20:42:03 INFO Remoting: Remoting shut down
15/08/11 20:42:03 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/08/11 20:42:03 INFO SparkContext: Successfully stopped SparkContext
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/avro-tools-1.7.6-cdh5.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Any ideas on what is going on wrong here? Thanks,
... View more