Created 08-22-2018 02:39 AM
I have a HDP cluster of version HDP 3.0.0.0. Machines in the cluster are all Ubuntu 16.04 OS.
I want to make a Windows machine able to connect and run Spark on the cluster.
So far I've managed to make Spark submit jobs to the cluster via `spark-submit --deploy-mode cluster --master yarn`.
I'm having trouble running `pyspark` interactive shell with `--deploy-mode client`, which, to my understanding, will create a driver process running on the Windows machine. Right now when I run `pyspark` in a Windows command line console (specifically, I use PowerShell), it always fails with the following outputs:
PS > pyspark --name pysparkTest8 Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2018-08-21 18:27:10 WARN DomainSocketFactory:117 - The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows. 2018-08-21 18:40:48 ERROR SparkContext:91 - Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.(SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext. (JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 2018-08-21 18:40:48 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered! 2018-08-21 18:40:48 WARN MetricsSystem:66 - Stopping a MetricsSystem that is not running 2018-08-21 18:40:48 WARN SparkContext:66 - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:423) py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) py4j.Gateway.invoke(Gateway.java:238) py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) py4j.GatewayConnection.run(GatewayConnection.java:238) java.lang.Thread.run(Thread.java:748) 2018-08-21 18:54:07 ERROR SparkContext:91 - Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext. (SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext. (JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 2018-08-21 18:54:07 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered! 2018-08-21 18:54:07 WARN MetricsSystem:66 - Stopping a MetricsSystem that is not running Traceback (most recent call last): File "C:\\python\pyspark\shell.py", line 54, in spark = SparkSession.builder.getOrCreate() File "C:\\python\pyspark\sql\session.py", line 173, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "C:\\python\pyspark\context.py", line 343, in getOrCreate SparkContext(conf=conf or SparkConf()) File "C:\\python\pyspark\context.py", line 118, in __init__ conf, jsc, profiler_cls) File "C:\\python\pyspark\context.py", line 180, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "C:\\python\pyspark\context.py", line 282, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "C:\\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1525, in _ _call__ File "C:\\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_re turn_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext. (SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext. (JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
When I look at the YARN application logs, there's something worth noting in stderr:
Log Type: stderr Log Upload Time: Tue Aug 21 18:50:14 -0700 2018 Log Length: 3774 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/11/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.0.0-1634/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 18/08/21 18:36:41 INFO util.SignalUtils: Registered signal handler for TERM 18/08/21 18:36:41 INFO util.SignalUtils: Registered signal handler for HUP 18/08/21 18:36:41 INFO util.SignalUtils: Registered signal handler for INT 18/08/21 18:36:41 INFO spark.SecurityManager: Changing view acls to: yarn,myusername 18/08/21 18:36:41 INFO spark.SecurityManager: Changing modify acls to: yarn,myusername 18/08/21 18:36:41 INFO spark.SecurityManager: Changing view acls groups to: 18/08/21 18:36:41 INFO spark.SecurityManager: Changing modify acls groups to: 18/08/21 18:36:41 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, myusername); groups with view permissions: Set(); users with modify permissions: Set(yarn, myusername); groups with modify permissions: Set() 18/08/21 18:36:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/08/21 18:36:42 INFO yarn.ApplicationMaster: Preparing Local resources 18/08/21 18:36:43 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/08/21 18:36:43 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1534303777268_0044_000001 18/08/21 18:36:44 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 18/08/21 18:38:51 ERROR yarn.ApplicationMaster: Failed to connect to driver at Windows-client-hostname:50000, retrying ... 18/08/21 18:38:51 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:672) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:532) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 18/08/21 18:38:51 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 18/08/21 18:38:51 INFO util.ShutdownHookManager: Shutdown hook called
My suspect is that the Windows client machine's firewall is blocking port 50000, because if I run telnet from one of the Ubuntu machines, I get "Connection timed out"
telnet windows-client-hostname 50000 Trying 10.100.1.61... telnet: Unable to connect to remote host: Connection timed out
But I have specifically allowed ports 1025-65535 in Inbound Rules in Windows Firewall with Advanced Security (my Windows is Windows Server 2012 R2).
I have configured `spark.port.maxRetries` as suggested in this post, but it didn't change anything. My `spark-defaults.conf` on the Windows client machine looks like this:
spark.master yarn spark.yarn.am.memory 4g spark.executor.memory 5g spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.maxResultSize 10g spark.driver.memory 5g spark.yarn.archive hdfs:///hdp/apps/3.0.0.0-1634/spark2/spark2-hdp-yarn-archive.tar.gz spark.port.maxRetries 100 spark.driver.port 50000
At this point I am totally confused. Can someone give some hints on how to tackle this?
Thank you very much!
Created 08-22-2018 11:06 AM
@Guozhen Li In yarn client mode the client machine - Windows machine needs to have network access to any of the cluster worker nodes (on any of the executors and AM could potentially run) and vise versa, the executors should be able to connect to the driver running on the windows client machine - I think you are right that this may be due firewall or network problem.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-23-2018 05:16 PM
Hey thanks Felix. I figured out it was actually neither Spark nor the firewall. It was due to an extra network adapter created by VirtualBox.
Created 08-22-2018 06:36 PM
I figured out what went wrong... It actually had nothing to do with Spark or Windows Firewall, but with VirtualBox.
My Windows machine has a VirtualBox installed, and hosts a guest VM. VirtualBox creates a network adapter called something like "VirtualBox Host-Only Network", which has a different IP address than the actual network adapter.
In my case, the actual network adapter is a LAN with IP address 10.100.1.61, and the VirtualBox Host-Only Network has an IP address 192.168.56.1.
I solved the issue by disabling the VirtualBox Host-Only Network in Control Panel >> Network and Internet >> Network Connections.
I found this by first running `pyspark` in PowerShell, then run `netstat -an | Select-String 50000`, and saw someone listening on 192.168.56.1:50000
PS > netstat -an | sls 50000 TCP 192.168.56.1:50000 0.0.0.0:0 LISTENING