Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Error Remote

avatar

Hello,

I try to build a job within talends spark components. Therfore i set up a cloudera cluster with spark.

The job is simple and only reads data from hdfs, filter some rows and store the result.

 

Talend and Cloudera running not one the same host and no firewall is between. The worker is recognised in the spark master web-ui.

 

But i run into some errors. I hope you have some suggestions for me. Here is a snippet from the spark-master log file.

 

Thanks

Robert

 

11:02:21.687 Uhr 	ERROR 	akka.remote.EndpointWriter 	

dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@xx.xx.xx.:7077/]] arriving at [akka.tcp://sparkMaster@xx.xx.xx:7077] inbound addresses are [akka.tcp://sparkMaster@hostnameMaster:7077]

ERROR akka.remote.EndpointWriter
AssociationError [akka.tcp://sparkMaster@hostnameMaster:7077] -> [akka.tcp://spark@hostnameMaster:49910]: Error [Association failed with [akka.tcp://spark@hostnamemaster49910]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@hostnameMaster:49910] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection timed out: hstnameMaster/xx.xx.xx:49910 ]
Spoiler
 

 

 

1 ACCEPTED SOLUTION

avatar

For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.

View solution in original post

7 REPLIES 7

avatar
Master Collaborator

How are you running it? do any other jobs work?  how about the shell?

I see a possible typo in "hostnamemaster49910", like a colon is missing. The host string is given differently several times.

 

avatar

Hello,

thanks for your reply. Yes it´s a typo, but only for covering the real adresses and names 😉

I could resolve the first error. I explain my case a little bit deeper.

 

The job runs directly in Talend. I guess that is not the reason for this failure, because I compiled the generated java code and fire it up with spark-submit on the cloudera host. This ends up in the same result.

 

The Application is registered in the web-ui of the master and an executor is shown in the workes web-ui. But I am getting this Association Error.

 

I hope this helps you and thanks for any suggestions!!

 

Master-Log

16:33:32.922 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

Registering app TalendSpark_tSparkConnection_1

16:33:32.923 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

Registered app TalendSpark_tSparkConnection_1 with ID app-20141022163332-0001

16:34:01.251 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

akka.tcp://spark@remoteHost:50571 got disassociated, removing it.

16:34:01.251 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

Removing app app-20141022163332-0001

16:34:01.252 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

akka.tcp://spark@remoteHost:50571 got disassociated, removing it.

16:34:01.251 Uhr 	INFO 	akka.actor.LocalActorRef 	

Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4085.214.61.169%3A50579-4#693766985] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

16:34:01.254 Uhr 	INFO 	akka.actor.LocalActorRef 	

Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4085.214.61.169%3A50579-4#693766985] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

16:34:02.652 Uhr 	INFO 	org.apache.spark.deploy.master.Master 	

akka.tcp://spark@remoteHost:50546 got disassociated, removing it.

16:34:02.654 Uhr 	ERROR 	akka.remote.EndpointWriter 	

AssociationError [akka.tcp://sparkMaster@hostnameMaster:7077] -> [akka.tcp://spark@remoteHost:50546]: Error [Association failed with [akka.tcp://spark@remoteHost:50546]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@remoteHost:50546]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection timed out: remoteHost/85.214.61.169:50546
]

 

 

Worker-Log

Launch command: "/usr/lib/jvm/java-7-oracle-cloudera/bin/java" ......."

16:35:20.254 Uhr 	INFO 	org.apache.spark.deploy.worker.Worker 	

Executor app-20141022163413-0002/0 finished with state EXITED message Command exited with code 1 exitStatus 1

16:35:20.267 Uhr 	INFO 	org.apache.spark.deploy.worker.Worker 	

Asked to launch executor app-20141022163413-0002/1 for TalendSpark_tSparkConnection_1

16:35:20.289 Uhr 	INFO 	akka.actor.LocalActorRef 	

Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4085.214.228.246%3A38745-2#-260186988] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

16:35:20.313 Uhr 	ERROR 	akka.remote.EndpointWriter 	

AssociationError [akka.tcp://sparkWorker@workerHost:7078] -> [akka.tcp://sparkExecutor@workerHost:57158]: Error [Association failed with [akka.tcp://sparkExecutor@workerHost:57158]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@workerHost:57158]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: workerHost/xx.xx.xx:57158
]

 

avatar

Also a Warning Message in Talend or from the console while using spark-submit is comming up:

 

org.apache.spark.scheduler.TaskSchedulerImpl - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

 

But 2 cores and 512mb just for reading and saving are enough i guess.

avatar
Master Collaborator

That could suggest that the amount of resource that is available to your Spark jobs is not big enough to accommodate the resources that Talend or your app are requesting. I don't know whether you mean only 2 cores are available or 2 are requested, but the question is whether the request exceeds what's available. I'd check on this aspect. For example if running via YARN, see how much resource YARN can allocate and look at your logs to see what Spark thinks it's asking for.

avatar

Hello, thanks for your effort.

Spark running in Standalone Mode. But I checked the resources request in the web-ui.

 

I have now 2 workers and one master. The workes provide 2 cores and 4GB. Requested with this job are 2 Cores and 512MB. If I added cores to the workes node, the number of cores that are requested increase to the available ones. But nothing changed. The Error message is still the same.

 

avatar

Hi,

I have some new interessting information.

Talend provides the option to export the hole job. As a result of this you can run a bash file which executes the java code - nothing new so far, but if I run the job this way directly on the cloudera cluster - spark runs without errors and i received the expected result.

 

I checked the communication from the remote host to cloudera and all related hosts via telnet. The connection can be established over port 7077 - spark master and 7078 spark - worker. I have no idea how to solve this problem. I would be glad if somone has any further hints.

 

Thanks

Robert

 

avatar

For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.