Post Job to Spark via YARN from VM on a virtual network

zachkirsch — Wed, 29 Jun 2016 01:34:40 GMT

I have a function Spark cluster that can accept jobs via YARN (i.e. I can start a job by running spark-submit and specify the master as yarn-client).

Now, I have a virtual machine set up on a virtual network with this Spark cluster. On the VM, I am working with Titan DB, which in its configuration allows us to set spark.master. If I set it as local[*], everything runs well. However, if I set spark.master as yarn-client, I get the following error:

java.lang.IllegalStateException: java.lang.ExceptionInInitializerError
        at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:82)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
        at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:117)
        at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
        at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:205)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:292)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1016)
        at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:441)
        at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
        at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:185)
        at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:119)
        at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:94)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:123)
        at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:58)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
        at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:82)
        at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
        at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:144)
        at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
        at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:303)
Caused by: java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:80)
        ... 44 more
Caused by: java.lang.ExceptionInInitializerError
        at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1873)
        at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)
        at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:240)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at org.apache.tinkerpop.gremlin.hadoop.process.computer.spark.SparkGraphComputer.lambda$submit$31(SparkGraphComputer.java:111)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: org.apache.spark.SparkException: Unable to load YARN support
        at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:211)
        at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:206)
        at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
        ... 14 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:207)
        ... 16 more

Clearly, there is some configuration to do (I believe on the VM end). I suspect we will have to install YARN on the VM, and have that yarn-client communicate with the yarn-client on the Spark cluster, but I am not sure. Can you point me in the right direction or tell me how to configure the VM and/or Spark so I can successfully submit this job? I can provide any more information that might be helpful.

Re: Post Job to Spark via YARN from VM on a virtual network

phargis — Wed, 29 Jun 2016 02:03:25 GMT

You probably need to install the spark-client on your VM, which will include all the proper jar files and binaries to connect to YARN. There is also a chance that the version of Spark used by Titan DB was built specifically without YARN dependencies (to avoid duplicates). You can always rebuild your local Spark installation with YARN dependencies, using the instructions here:

http://spark.apache.org/docs/latest/building-spark.html

For instance, here is a sample build command using maven:

build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

Re: Post Job to Spark via YARN from VM on a virtual network

zachkirsch — Wed, 29 Jun 2016 02:28:35 GMT

Thanks for the reply. An issue that I'm having is that I'm not sure how to designate that the job is distributed to the existing Spark cluster (rather than just run on the VM).

Re: Post Job to Spark via YARN from VM on a virtual network

phargis — Wed, 29 Jun 2016 04:46:09 GMT

You can designate either way by setting --master and --deploy-mode arguments correctly. By designating --master=yarn, the Spark executors will be run on the cluster; --master=local[*] will place the executors on the local machine. The Spark driver location will then be determined by one of these modes: --deploy-mode=cluster runs driver on cluster, --deploy-mode=client runs driver on client (VM where it is launched).

More info here:

http://spark.apache.org/docs/latest/submitting-applications.html

question Re: Post Job to Spark via YARN from VM on a virtual network in Support Questions

Post Job to Spark via YARN from VM on a virtual network

Re: Post Job to Spark via YARN from VM on a virtual network

Re: Post Job to Spark via YARN from VM on a virtual network

Re: Post Job to Spark via YARN from VM on a virtual network