Support Questions

Find answers, ask questions, and share your expertise

spark continously running with exit code 1

Super Collaborator



I'm new to apache spark so Im not sure if this is the best set up, my goal is to create an environment where I can test and evaluate before making decision. I set up cluster on Windows using the steps from:


The cluster version Im using is the latest: 3.3.1\Hadoop 3


The master node is starting without an issue and Im able to register the workers on each worker node using the following comand:

spark-class org.apache.spark.deploy.worker.Worker spark://<Master-IP>:7077 --host <Worker-IP>

When I register the worker , its able to connect and register successfully as the message indicates , and Im able to see both workers in the US with the ALIVE status.


Then I tried submitting  simple  hello_world py job using:

spark-submit --master spark://<Master-IP>:7077


My application is like this:



spark=SparkSession.builder.appName("Hello World").getOrCreate()
print("Hello From Spark!")  



What happens when I submit the job is that spark will continuously try to create different executors as if its retrying  but they all exit with code 1, and I have to kill it in order to stop.


When I check the UI and I click on a given executor I see the following in the stdout & std err:


22/12/12 08:04:11 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 6544@HOU12-FSRM01
22/12/12 08:04:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/12 08:04:11 INFO SecurityManager: Changing view acls to: vnetadmin
22/12/12 08:04:11 INFO SecurityManager: Changing modify acls to: vnetadmin
22/12/12 08:04:11 INFO SecurityManager: Changing view acls groups to: 
22/12/12 08:04:11 INFO SecurityManager: Changing modify acls groups to: 
22/12/12 08:04:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(vnetadmin); groups with view permissions: Set(); users  with modify permissions: Set(vnetadmin); groups with modify permissions: Set()


sing Spark's default log4j profile: org/apache/spark/
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
	at Caused by: Failed to connect to <Master DNS>/<Master IP>:56526
Caused by:$AnnotatedConnectException: Connection refused: no further information: <Master DNS>/<Master IP>:56526
Caused by: Connection refused: no further information
	at Method)


Not sure how to fix the error above. I tried opening the referenced port "Failed to connect to <Master DNS>/<Master IP>:56526" from the master node but every time it shows a different port.

Note sure what else I can do or how to troubleshoot.

Any help is appreciated.