Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

The spark workers does not get connected with Spark Master

avatar
Explorer

Hi,

We're facing an issue with Spark in Production environments that the spark workers does not get connected with Spark Master. Please see the logs below and help to resolve this issue.

Master Log:

18/06/25 22:59:12 INFO master.Master: akka.tcp://sparkWorker@spark7:7084 got disassociated, removing it. 18/06/25 22:59:12 INFO master.Master: akka.tcp://sparkWorker@spark7:7084 got disassociated, removing it. 18/06/25 22:59:12 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark7:7084] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 22:59:28 INFO master.Master: akka.tcp://sparkWorker@spark7:7079 got disassociated, removing it. 18/06/25 22:59:28 INFO master.Master: akka.tcp://sparkWorker@spark7:7079 got disassociated, removing it. 18/06/25 22:59:28 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark7:7079] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 22:59:28 INFO master.Master: akka.tcp://sparkWorker@spark7:7082 got disassociated, removing it. 18/06/25 22:59:28 INFO master.Master: akka.tcp://sparkWorker@spark7:7082 got disassociated, removing it. 18/06/25 22:59:28 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark7:7082] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 22:59:35 INFO master.Master: akka.tcp://sparkWorker@spark8:7081 got disassociated, removing it. 18/06/25 22:59:35 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark8:7081] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 22:59:35 INFO master.Master: akka.tcp://sparkWorker@spark8:7081 got disassociated, removing it. 18/06/25 23:00:23 INFO master.Master: akka.tcp://sparkWorker@spark9:7081 got disassociated, removing it. 18/06/25 23:00:23 INFO master.Master: akka.tcp://sparkWorker@spark9:7081 got disassociated, removing it. 18/06/25 23:00:23 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark9:7081] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 23:00:45 INFO master.Master: akka.tcp://sparkWorker@spark8:7085 got disassociated, removing it. 18/06/25 23:00:45 INFO master.Master: akka.tcp://sparkWorker@spark8:7085 got disassociated, removing it. 18/06/25 23:00:45 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark8:7085] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 23:00:48 INFO master.Master: akka.tcp://sparkWorker@spark7:7083 got disassociated, removing it. 18/06/25 23:00:48 INFO master.Master: akka.tcp://sparkWorker@spark7:7083 got disassociated, removing it. 18/06/25 23:00:48 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark7:7083] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 18/06/25 23:01:52 INFO master.Master: akka.tcp://sparkWorker@spark0:7080 got disassociated, removing it. 18/06/25 23:01:52 INFO master.Master: akka.tcp://sparkWorker@spark0:7080 got disassociated, removing it. 18/06/25 23:01:52 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkWorker@spark0:7080] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].


Worker Log


18/06/25 22:43:56 INFO util.Utils: Successfully started service 'sparkWorker' on port 7081. 18/06/25 22:43:56 INFO worker.Worker: Starting Spark worker HKLPADBID09:7081 with 4 cores, 16.0 GB RAM 18/06/25 22:43:56 INFO worker.Worker: Running Spark version 1.4.1-palantir3 18/06/25 22:43:56 INFO worker.Worker: Spark home: /opt/palantir/spark-1.4.1-palantir3-bin-hadoop2.4 18/06/25 22:43:56 INFO server.Server: jetty-8.y.z-SNAPSHOT 18/06/25 22:43:56 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8084 18/06/25 22:43:56 INFO util.Utils: Successfully started service 'WorkerUI' on port 8084. 18/06/25 22:43:56 INFO ui.WorkerWebUI: Started WorkerWebUI at http://SPARK:8084 18/06/25 22:43:56 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:44:10 INFO worker.Worker: Retrying connection to master (attempt # 1) 18/06/25 22:44:10 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:44:24 INFO worker.Worker: Retrying connection to master (attempt # 2) 18/06/25 22:44:24 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:44:38 INFO worker.Worker: Retrying connection to master (attempt # 3) 18/06/25 22:44:38 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:44:52 INFO worker.Worker: Retrying connection to master (attempt # 4) 18/06/25 22:44:52 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:45:06 INFO worker.Worker: Retrying connection to master (attempt # 5) 18/06/25 22:45:06 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:45:20 INFO worker.Worker: Retrying connection to master (attempt # 6) 18/06/25 22:45:20 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:46:42 INFO worker.Worker: Retrying connection to master (attempt # 7) 18/06/25 22:46:42 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:48:04 INFO worker.Worker: Retrying connection to master (attempt # 8) 18/06/25 22:48:04 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:49:26 INFO worker.Worker: Retrying connection to master (attempt # 9) 18/06/25 22:49:26 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:50:48 INFO worker.Worker: Retrying connection to master (attempt # 10) 18/06/25 22:50:48 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:52:10 INFO worker.Worker: Retrying connection to master (attempt # 11) 18/06/25 22:52:10 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:53:32 INFO worker.Worker: Retrying connection to master (attempt # 12) 18/06/25 22:53:32 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:54:54 INFO worker.Worker: Retrying connection to master (attempt # 13) 18/06/25 22:54:54 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:56:16 INFO worker.Worker: Retrying connection to master (attempt # 14) 18/06/25 22:56:16 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:57:38 INFO worker.Worker: Retrying connection to master (attempt # 15) 18/06/25 22:57:38 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 22:59:00 INFO worker.Worker: Retrying connection to master (attempt # 16) 18/06/25 22:59:00 INFO worker.Worker: Connecting to master akka.tcp://sparkMaster@spark:7077/user/Master... 18/06/25 23:00:22 ERROR worker.Worker: All masters are unresponsive! Giving up. 18/06/25 23:00:22 INFO util.Utils: Shutdown hook called

3 REPLIES 3

avatar
Explorer

@Geoffrey Shelton Okot @adash Seeking your help to fix the issue which mentioned above. Thanks.

avatar
Master Mentor

@Saravana V

Can you check the ports in the Java code and the Akka configuration match

avatar
Explorer

@Geoffrey Shelton Okot I see the port mentioned in config is correct and how to check the port in java code.