Created 06-28-2016 06:34 PM
I have a function Spark cluster that can accept jobs via YARN (i.e. I can start a job by running spark-submit and specify the master as yarn-client).
Now, I have a virtual machine set up on a virtual network with this Spark cluster. On the VM, I am working with Titan DB, which in its configuration allows us to set spark.master. If I set it as local[*], everything runs well. However, if I set spark.master as yarn-client, I get the following error:
java.lang.IllegalStateException: java.lang.ExceptionInInitializerError at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:82) at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140) at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:117) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215) at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:205) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:292) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1016) at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:441) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215) at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:185) at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:119) at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:94) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:123) at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:58) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150) at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:82) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215) at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:144) at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215) at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:303) Caused by: java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:80) ... 44 more Caused by: java.lang.ExceptionInInitializerError at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1873) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159) at org.apache.spark.SparkContext.<init>(SparkContext.scala:240) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at org.apache.tinkerpop.gremlin.hadoop.process.computer.spark.SparkGraphComputer.lambda$submit$31(SparkGraphComputer.java:111) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) Caused by: org.apache.spark.SparkException: Unable to load YARN support at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:206) at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala) ... 14 more Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:207) ... 16 more
Clearly, there is some configuration to do (I believe on the VM end). I suspect we will have to install YARN on the VM, and have that yarn-client communicate with the yarn-client on the Spark cluster, but I am not sure. Can you point me in the right direction or tell me how to configure the VM and/or Spark so I can successfully submit this job? I can provide any more information that might be helpful.
Created 06-28-2016 07:03 PM
You probably need to install the spark-client on your VM, which will include all the proper jar files and binaries to connect to YARN. There is also a chance that the version of Spark used by Titan DB was built specifically without YARN dependencies (to avoid duplicates). You can always rebuild your local Spark installation with YARN dependencies, using the instructions here:
http://spark.apache.org/docs/latest/building-spark.html
For instance, here is a sample build command using maven:
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Created 06-28-2016 07:03 PM
You probably need to install the spark-client on your VM, which will include all the proper jar files and binaries to connect to YARN. There is also a chance that the version of Spark used by Titan DB was built specifically without YARN dependencies (to avoid duplicates). You can always rebuild your local Spark installation with YARN dependencies, using the instructions here:
http://spark.apache.org/docs/latest/building-spark.html
For instance, here is a sample build command using maven:
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Created 06-28-2016 07:28 PM
Thanks for the reply. An issue that I'm having is that I'm not sure how to designate that the job is distributed to the existing Spark cluster (rather than just run on the VM).
Created 06-28-2016 09:46 PM
You can designate either way by setting --master and --deploy-mode arguments correctly. By designating --master=yarn, the Spark executors will be run on the cluster; --master=local[*] will place the executors on the local machine. The Spark driver location will then be determined by one of these modes: --deploy-mode=cluster runs driver on cluster, --deploy-mode=client runs driver on client (VM where it is launched).
More info here:
http://spark.apache.org/docs/latest/submitting-applications.html