Support Questions

Find answers, ask questions, and share your expertise

spark continously running with exit code 1

Super Collaborator

Hi,

 

I'm new to apache spark so Im not sure if this is the best set up, my goal is to create an environment where I can test and evaluate before making decision. I set up cluster on Windows using the steps from: 

https://aamargajbhiye.medium.com/apache-spark-setup-a-multi-node-standalone-cluster-on-windows-63d41...

 

The cluster version Im using is the latest: 3.3.1\Hadoop 3

 

The master node is starting without an issue and Im able to register the workers on each worker node using the following comand:

spark-class org.apache.spark.deploy.worker.Worker spark://<Master-IP>:7077 --host <Worker-IP>

When I register the worker , its able to connect and register successfully as the message indicates , and Im able to see both workers in the US with the ALIVE status.

 

Then I tried submitting  simple  hello_world py job using:

spark-submit --master spark://<Master-IP>:7077 hello_world.py

 

My hello_world.py application is like this:

 

 

spark=SparkSession.builder.appName("Hello World").getOrCreate()
print("Hello From Spark!")  
sparkContext=spark.sparkContext
rdd=sparkContext.parallelize([1,2,3])
print(rdd.collect())

 

 

What happens when I submit the job is that spark will continuously try to create different executors as if its retrying  but they all exit with code 1, and I have to kill it in order to stop.

 

When I check the UI and I click on a given executor I see the following in the stdout & std err:

stdout:

22/12/12 08:04:11 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 6544@HOU12-FSRM01
22/12/12 08:04:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/12 08:04:11 INFO SecurityManager: Changing view acls to: vnetadmin
22/12/12 08:04:11 INFO SecurityManager: Changing modify acls to: vnetadmin
22/12/12 08:04:11 INFO SecurityManager: Changing view acls groups to: 
22/12/12 08:04:11 INFO SecurityManager: Changing modify acls groups to: 
22/12/12 08:04:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(vnetadmin); groups with view permissions: Set(); users  with modify permissions: Set(vnetadmin); groups with modify permissions: Set()

stderr:

sing Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	....
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
	at Caused by: java.io.IOException: Failed to connect to <Master DNS>/<Master IP>:56526
	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
	.....
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: <Master DNS>/<Master IP>:56526
Caused by: java.net.ConnectException: Connection refused: no further information
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
	....

 

Not sure how to fix the error above. I tried opening the referenced port "Failed to connect to <Master DNS>/<Master IP>:56526" from the master node but every time it shows a different port.

Note sure what else I can do or how to troubleshoot.

Any help is appreciated.

0 REPLIES 0