Support Questions

Find answers, ask questions, and share your expertise

RandomForest causing Heap Space error

avatar
Super Collaborator

HDP-2.5.0.0 using Ambari 2.4.0.1, Spark 2.0.1.

I am having a Scala code that reads a 108MB csv file and uses the RandomForest.

I run the following command :

/usr/hdp/current/spark2-client/bin/spark-submit --class samples.FuelModel --master yarn --deploy-mode cluster --driver-memory 8g spark-assembly-1.0.jar

The console output :

16/12/22 09:07:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/22 09:07:25 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/12/22 09:07:26 INFO TimelineClientImpl: Timeline service address: http://l4326pp.sss.com:8188/ws/v1/timeline/
16/12/22 09:07:26 INFO AHSProxy: Connecting to Application History server at l4326pp.sss.com/138.106.33.132:10200
16/12/22 09:07:26 INFO Client: Requesting a new application from cluster with 4 NodeManagers
16/12/22 09:07:26 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (204800 MB per container)
16/12/22 09:07:26 INFO Client: Will allocate AM container, with 9011 MB memory including 819 MB overhead
16/12/22 09:07:26 INFO Client: Setting up container launch context for our AM
16/12/22 09:07:26 INFO Client: Setting up the launch environment for our AM container
16/12/22 09:07:26 INFO Client: Preparing resources for our AM container
16/12/22 09:07:27 INFO YarnSparkHadoopUtil: getting token for namenode: hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315
16/12/22 09:07:27 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 79178 for ojoqcu on ha-hdfs:prodhadoop
16/12/22 09:07:28 INFO metastore: Trying to connect to metastore with URI thrift://l4327pp.sss.com:9083
16/12/22 09:07:29 INFO metastore: Connected to metastore.
16/12/22 09:07:29 INFO YarnSparkHadoopUtil: HBase class not found java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
16/12/22 09:07:29 INFO Client: Source and destination file systems are the same. Not copying hdfs:/lib/spark2_2.0.1.tar.gz
16/12/22 09:07:29 INFO Client: Uploading resource file:/localhome/ojoqcu/code/debug/Rikard/spark-assembly-1.0.jar -> hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315/spark-assembly-1.0.jar
16/12/22 09:07:29 INFO Client: Uploading resource file:/tmp/spark-ff9db580-00db-476e-9086-377c60bc7e2a/__spark_conf__1706674327523194508.zip -> hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315/__spark_conf__.zip
16/12/22 09:07:29 WARN Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/12/22 09:07:29 INFO SecurityManager: Changing view acls to: ojoqcu
16/12/22 09:07:29 INFO SecurityManager: Changing modify acls to: ojoqcu
16/12/22 09:07:29 INFO SecurityManager: Changing view acls groups to:
16/12/22 09:07:29 INFO SecurityManager: Changing modify acls groups to:
16/12/22 09:07:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ojoqcu); groups with view permissions: Set(); users  with modify permissions: Set(ojoqcu); groups with modify permissions: Set()
16/12/22 09:07:29 INFO Client: Submitting application application_1481607361601_8315 to ResourceManager
16/12/22 09:07:30 INFO YarnClientImpl: Submitted application application_1481607361601_8315
16/12/22 09:07:31 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:31 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: dataScientist
         start time: 1482394049862
         final status: UNDEFINED
         tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
         user: ojoqcu
16/12/22 09:07:32 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:33 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:34 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:35 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: N/A
         ApplicationMaster host: 138.106.33.145
         ApplicationMaster RPC port: 0
         queue: dataScientist
         start time: 1482394049862
         final status: UNDEFINED
         tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
         user: ojoqcu
16/12/22 09:07:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:52 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:53 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:54 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:55 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:56 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:09 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:10 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:11 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:12 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:13 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:14 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:15 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:16 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:17 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:18 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:19 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:20 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:21 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:22 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:23 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:24 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:25 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:26 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:27 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:28 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:29 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:30 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:31 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:32 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:33 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:34 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:52 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:52 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: dataScientist
         start time: 1482394049862
         final status: UNDEFINED
         tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
         user: ojoqcu
16/12/22 09:08:53 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:54 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:55 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:56 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:57 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: N/A
         ApplicationMaster host: 138.106.33.144
         ApplicationMaster RPC port: 0
         queue: dataScientist
         start time: 1482394049862
         final status: UNDEFINED
         tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
         user: ojoqcu
16/12/22 09:08:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:09 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:10 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:11 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:12 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:13 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:14 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:15 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:16 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:17 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:18 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:19 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:20 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:21 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:22 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:23 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:24 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:25 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:26 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:27 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:28 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:29 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:30 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:31 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:32 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:33 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:34 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:52 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:53 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:54 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:55 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:56 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:09 INFO Client: Application report for application_1481607361601_8315 (state: FINISHED)
16/12/22 09:10:09 INFO Client:
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 28.0 failed 4 times, most recent failure: Lost task 1.3 in stage 28.0 (TID 59, l4328pp.sss.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_e63_1481607361601_8315_02_000005 on host: l4328pp.sss.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Driver stacktrace:
         ApplicationMaster host: 138.106.33.144
         ApplicationMaster RPC port: 0
         queue: dataScientist
         start time: 1482394049862
         final status: FAILED
         tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
         user: ojoqcu
Exception in thread "main" org.apache.spark.SparkException: Application application_1481607361601_8315 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/22 09:10:09 INFO ShutdownHookManager: Shutdown hook called
16/12/22 09:10:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-ff9db580-00db-476e-9086-377c60bc7e2a377c60bc7e2a

Partial output from the YARN log from one of the nodes :

2016-12-22 09:08:54,205 INFO  container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e63_1481607361601_8315_02_000001 transitioned from LOCALIZED to RUNNING
2016-12-22 09:08:54,206 INFO  runtime.DelegatingLinuxContainerRuntime (DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using container runtime: DefaultLinuxContainerRuntime
2016-12-22 09:08:56,779 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(375)) - Starting resource-monitoring for container_e63_1481607361601_8315_02_000001
2016-12-22 09:08:56,810 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 545.1 MB of 12 GB physical memory used; 10.3 GB of 25.2 GB virtual memory used
2016-12-22 09:08:59,850 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 804.6 MB of 12 GB physical memory used; 10.4 GB of 25.2 GB virtual memory used
2016-12-22 09:09:02,886 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 973.3 MB of 12 GB physical memory used; 10.4 GB of 25.2 GB virtual memory used
2016-12-22 09:09:05,921 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:08,957 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:12,000 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:15,037 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:18,080 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:21,116 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:24,153 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.2 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:27,193 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.4 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:30,225 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.4 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:33,270 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.4 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:36,302 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.4 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:39,339 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.6 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:42,376 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.6 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:45,422 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.6 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:48,459 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.6 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:51,502 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.7 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:54,539 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.7 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:09:57,583 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.7 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:10:00,611 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.7 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:10:02,059 INFO  ipc.Server (Server.java:saslProcess(1538)) - Auth successful for appattempt_1481607361601_8315_000002 (auth:SIMPLE)
2016-12-22 09:10:02,061 INFO  authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for appattempt_1481607361601_8315_000002 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
2016-12-22 09:10:02,062 INFO  containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(810)) - Start request for container_e63_1481607361601_8315_02_000005 by user ojoqcu
2016-12-22 09:10:02,063 INFO  application.ApplicationImpl (ApplicationImpl.java:transition(304)) - Adding container_e63_1481607361601_8315_02_000005 to application application_1481607361601_8315
2016-12-22 09:10:02,063 INFO  container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e63_1481607361601_8315_02_000005 transitioned from NEW to LOCALIZING
2016-12-22 09:10:02,063 INFO  containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId application_1481607361601_8315
2016-12-22 09:10:02,063 INFO  yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(184)) - Initializing container container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:02,063 INFO  yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(270)) - Initializing container container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:02,063 INFO  container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e63_1481607361601_8315_02_000005 transitioned from LOCALIZING to LOCALIZED
2016-12-22 09:10:02,063 INFO  nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=ojoqcuIP=138.106.33.144OPERATION=Start Container RequestTARGET=ContainerManageImplRESULT=SUCCESSAPPID=application_1481607361601_8315CONTAINERID=container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:02,078 INFO  container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e63_1481607361601_8315_02_000005 transitioned from LOCALIZED to RUNNING
2016-12-22 09:10:02,078 INFO  runtime.DelegatingLinuxContainerRuntime (DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using container runtime: DefaultLinuxContainerRuntime
2016-12-22 09:10:03,612 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(375)) - Starting resource-monitoring for container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:03,647 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 34005 for container-id container_e63_1481607361601_8315_02_000005: 452.4 MB of 4 GB physical memory used; 3.1 GB of 8.4 GB virtual memory used
2016-12-22 09:10:03,673 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.7 GB of 12 GB physical memory used; 10.5 GB of 25.2 GB virtual memory used
2016-12-22 09:10:06,708 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 34005 for container-id container_e63_1481607361601_8315_02_000005: 815.3 MB of 4 GB physical memory used; 3.1 GB of 8.4 GB virtual memory used
2016-12-22 09:10:06,738 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 32969 for container-id container_e63_1481607361601_8315_02_000001: 1.8 GB of 12 GB physical memory used; 10.6 GB of 25.2 GB virtual memory used
2016-12-22 09:10:09,094 WARN  privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(170)) - Shell execution returned exit code: 143. Privileged Execution Operation Output: 
main : command provided 1
main : run as user is ojoqcu
main : requested yarn user is ojoqcu
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /opt/hdfsdisks/sdh/yarn/local/nmPrivate/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005/container_e63_1481607361601_8315_02_000005.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution: 
[/usr/hdp/current/hadoop-yarn-nodemanager/bin/container-executor, ojoqcu, ojoqcu, 1, application_1481607361601_8315, container_e63_1481607361601_8315_02_000005, /opt/hdfsdisks/sdg/yarn/local/usercache/ojoqcu/appcache/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005, /opt/hdfsdisks/sdb/yarn/local/nmPrivate/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005/launch_container.sh, /opt/hdfsdisks/sdk/yarn/local/nmPrivate/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005/container_e63_1481607361601_8315_02_000005.tokens, /opt/hdfsdisks/sdh/yarn/local/nmPrivate/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005/container_e63_1481607361601_8315_02_000005.pid, /opt/hdfsdisks/sdb/yarn/local%/opt/hdfsdisks/sdc/yarn/local%/opt/hdfsdisks/sdd/yarn/local%/opt/hdfsdisks/sde/yarn/local%/opt/hdfsdisks/sdf/yarn/local%/opt/hdfsdisks/sdg/yarn/local%/opt/hdfsdisks/sdh/yarn/local%/opt/hdfsdisks/sdi/yarn/local%/opt/hdfsdisks/sdj/yarn/local%/opt/hdfsdisks/sdk/yarn/local%/opt/hdfsdisks/sdl/yarn/local%/opt/hdfsdisks/sdm/yarn/local, /opt/hdfsdisks/sdb/yarn/log%/opt/hdfsdisks/sdc/yarn/log%/opt/hdfsdisks/sdd/yarn/log%/opt/hdfsdisks/sde/yarn/log%/opt/hdfsdisks/sdf/yarn/log%/opt/hdfsdisks/sdg/yarn/log%/opt/hdfsdisks/sdh/yarn/log%/opt/hdfsdisks/sdi/yarn/log%/opt/hdfsdisks/sdj/yarn/log%/opt/hdfsdisks/sdk/yarn/log%/opt/hdfsdisks/sdl/yarn/log%/opt/hdfsdisks/sdm/yarn/log, cgroups=none]
2016-12-22 09:10:09,095 WARN  runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143: 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=143: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
2016-12-22 09:10:09,095 WARN  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:launchContainer(400)) - Exit code from container container_e63_1481607361601_8315_02_000005 is : 143
2016-12-22 09:10:09,095 INFO  container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e63_1481607361601_8315_02_000005 transitioned from RUNNING to EXITED_WITH_FAILURE
2016-12-22 09:10:09,095 INFO  launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(425)) - Cleaning up container container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:09,095 INFO  runtime.DelegatingLinuxContainerRuntime (DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using container runtime: DefaultLinuxContainerRuntime
2016-12-22 09:10:09,114 INFO  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /opt/hdfsdisks/sdb/yarn/local/usercache/ojoqcu/appcache/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:09,114 INFO  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /opt/hdfsdisks/sdc/yarn/local/usercache/ojoqcu/appcache/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:09,115 INFO  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /opt/hdfsdisks/sdd/yarn/local/usercache/ojoqcu/appcache/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:09,115 INFO  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /opt/hdfsdisks/sde/yarn/local/usercache/ojoqcu/appcache/application_1481607361601_8315/container_e63_1481607361601_8315_02_000005
2016-12-22 09:10:09,115 WARN  nodemanager.NMAuditLogger (NMAuditLogger.java:logFailure(150)) - USER=ojoqcuOPERATION=Container Finished - FailedTARGET=ContainerImplRESULT=FAILUREDESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1481607361601_8315CONTAINERID=container_e63_1481607361601_8315_02_000005

At least one container is killed, probably, due to OutOfMemory as the heap space overflows.

Attached are the screenshots from the SparkUI.

10665-spark-ui-jobs.png

10666-spark-ui-failed-job.png

10667-spark-ui-outofmemory.png

How shall I proceed :

  1. Would the using higher values for options like --driver-memory and --executor-memory help ?
  2. Is there some Spark setting that needs to be changed via Ambari ?
1 ACCEPTED SOLUTION

avatar
Master Guru

Yes increase driver and executor memory.

http://spark.apache.org/docs/latest/configuration.html

Also specify more executors.

in spark submit

--master yarn --deploy-mode cluster

In my code I like to:

val sparkConf = new SparkConf().setAppName("Links")

sparkConf.set("spark.cores.max", "32")
sparkConf.set("spark.serializer", classOf[KryoSerializer].getName)
sparkConf.set("spark.sql.tungsten.enabled", "true")
sparkConf.set("spark.eventLog.enabled", "true")
sparkConf.set("spark.app.id", "MyApp")
sparkConf.set("spark.io.compression.codec", "snappy")
sparkConf.set("spark.rdd.compress", "false")
sparkConf.set("spark.suffle.compress", "true")

See

http://spark.apache.org/docs/latest/submitting-applications.html

 --executor-memory 32G \
  --num-executors 50 \

Up these

spark.driver.cores32Number of cores to use for the driver process, only in cluster mode.
spark.driver.maxResultSize1gLimit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.
spark.driver.memory32gAmount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.
spark.executor.memory32gAmount of memory to use per executor process (e.g. 2g, 8g).

See http://spark.apache.org/docs/latest/running-on-yarn.html for running on YARN

 --driver-memory 32g \
    --executor-memory 32g \ 

--executor-cores 1 \

Use as much memory as you can for optimal Spark performance.

spark.yarn.am.memory16GAmount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m, 2g). In cluster mode, use spark.driver.memory instead.

Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.

spark.driver.memory16gAmount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). Note: In client mode, this config must not be set through the SparkConfdirectly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.
spark.driver.cores50Number of cores used by the driver in YARN cluster mode. Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN Application Master. In client mode, use spark.yarn.am.cores to control the number of cores used by the YARN Application Master instead.

Up things where you can and run in YARN cluster mode.

Good Reference:

http://www.slideshare.net/pdx_spark/performance-in-spark-20-pdx-spark-meetup-81816

http://www.slideshare.net/JenAman/rearchitecting-spark-for-performance-understandability-63065166

See the Hortonworks Spark Tuning Guide

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/ch_tuning-s...

View solution in original post

2 REPLIES 2

avatar
Master Guru

Yes increase driver and executor memory.

http://spark.apache.org/docs/latest/configuration.html

Also specify more executors.

in spark submit

--master yarn --deploy-mode cluster

In my code I like to:

val sparkConf = new SparkConf().setAppName("Links")

sparkConf.set("spark.cores.max", "32")
sparkConf.set("spark.serializer", classOf[KryoSerializer].getName)
sparkConf.set("spark.sql.tungsten.enabled", "true")
sparkConf.set("spark.eventLog.enabled", "true")
sparkConf.set("spark.app.id", "MyApp")
sparkConf.set("spark.io.compression.codec", "snappy")
sparkConf.set("spark.rdd.compress", "false")
sparkConf.set("spark.suffle.compress", "true")

See

http://spark.apache.org/docs/latest/submitting-applications.html

 --executor-memory 32G \
  --num-executors 50 \

Up these

spark.driver.cores32Number of cores to use for the driver process, only in cluster mode.
spark.driver.maxResultSize1gLimit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.
spark.driver.memory32gAmount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.
spark.executor.memory32gAmount of memory to use per executor process (e.g. 2g, 8g).

See http://spark.apache.org/docs/latest/running-on-yarn.html for running on YARN

 --driver-memory 32g \
    --executor-memory 32g \ 

--executor-cores 1 \

Use as much memory as you can for optimal Spark performance.

spark.yarn.am.memory16GAmount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m, 2g). In cluster mode, use spark.driver.memory instead.

Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively.

spark.driver.memory16gAmount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). Note: In client mode, this config must not be set through the SparkConfdirectly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file.
spark.driver.cores50Number of cores used by the driver in YARN cluster mode. Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN Application Master. In client mode, use spark.yarn.am.cores to control the number of cores used by the YARN Application Master instead.

Up things where you can and run in YARN cluster mode.

Good Reference:

http://www.slideshare.net/pdx_spark/performance-in-spark-20-pdx-spark-meetup-81816

http://www.slideshare.net/JenAman/rearchitecting-spark-for-performance-understandability-63065166

See the Hortonworks Spark Tuning Guide

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/ch_tuning-s...

avatar
Super Collaborator

The command that worked :

*Most conservative, takes the least cluster

/usr/hdp/current/spark2-client/bin/spark-submit --class samples.FuelModel--master yarn --deploy-mode cluster --executor-memory 4g--driver-memory 8g spark-assembly-1.0.jar

The time taken is too long(40+ minutes) to be acceptable, I tried increasing the no. of cores and executors but it didn't help much. I guess the overhead exceeds the performance benefit due to the processing a small file(< 200MB) on 2 nodes. I would be glad if you can provide any pointers.