I am trying to submit a job to Spark via Tinkerpop 3.2.0, but I keep running into this exception:
16/06/30 23:26:09 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1467158618360_0050_01_000002 on host: 192.168.2.23. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1467158618360_0050_01_000002 Exit code: 1
From my research, it seems that this is signaling low memory, but I have allocated a lot of memory. Below are all the relevant (I think) configurations.
spark.master=yarn-client spark.app.id=gremlin spark.ui.port=4051 spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/* spark.executor.extraJavaOptions=-Dhdp.version=126.96.36.199-258 spark.executor.instances=4 spark.executor.memory=1g spark.driver.memory=1g spark.executor.userClassPathFirst=true spark.storage.memoryFraction=0.4 spark.shuffle.memoryFraction=0.4 spark.yarn.executor.memoryOverhead=4096
I read that this could be caused by a Java version issue. Though the cluster came with Java 1.7, I had to install and use Java 1.8 instead (Tinkerpop requires 1.8). Is this the cause of the exception, and if so, is there any way around it?
I would appreciate any help. Thanks!
@Zach Kirsch The problem is more likely a lack of correlation between Spark's request for RAM (driver memory + executor memory) and Yarn's container sizing configuration. Yarn settings determine min/max container sizes, and should be based on available physical memory, number of nodes, etc. As a rule of thumb, try making the minimum Yarn container size 1.5 times the size of the requested driver/executor memory (in this case, 1.5 GB).