Created on 10-22-2014 02:38 AM - edited 09-16-2022 02:10 AM
Hello,
I try to build a job within talends spark components. Therfore i set up a cloudera cluster with spark.
The job is simple and only reads data from hdfs, filter some rows and store the result.
Talend and Cloudera running not one the same host and no firewall is between. The worker is recognised in the spark master web-ui.
But i run into some errors. I hope you have some suggestions for me. Here is a snippet from the spark-master log file.
Thanks
Robert
11:02:21.687 Uhr ERROR akka.remote.EndpointWriter dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@xx.xx.xx.:7077/]] arriving at [akka.tcp://sparkMaster@xx.xx.xx:7077] inbound addresses are [akka.tcp://sparkMaster@hostnameMaster:7077]
ERROR akka.remote.EndpointWriter
AssociationError [akka.tcp://sparkMaster@hostnameMaster:7077] -> [akka.tcp://spark@hostnameMaster:49910]: Error [Association failed with [akka.tcp://spark@hostnamemaster49910]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@hostnameMaster:49910] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection timed out: hstnameMaster/xx.xx.xx:49910 ]
Created 10-27-2014 08:08 AM
For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.
Created 10-22-2014 03:21 AM
How are you running it? do any other jobs work? how about the shell?
I see a possible typo in "hostnamemaster49910", like a colon is missing. The host string is given differently several times.
Created 10-22-2014 07:49 AM
Hello,
thanks for your reply. Yes it´s a typo, but only for covering the real adresses and names 😉
I could resolve the first error. I explain my case a little bit deeper.
The job runs directly in Talend. I guess that is not the reason for this failure, because I compiled the generated java code and fire it up with spark-submit on the cloudera host. This ends up in the same result.
The Application is registered in the web-ui of the master and an executor is shown in the workes web-ui. But I am getting this Association Error.
I hope this helps you and thanks for any suggestions!!
Master-Log
16:33:32.922 Uhr INFO org.apache.spark.deploy.master.Master Registering app TalendSpark_tSparkConnection_1 16:33:32.923 Uhr INFO org.apache.spark.deploy.master.Master Registered app TalendSpark_tSparkConnection_1 with ID app-20141022163332-0001 16:34:01.251 Uhr INFO org.apache.spark.deploy.master.Master akka.tcp://spark@remoteHost:50571 got disassociated, removing it. 16:34:01.251 Uhr INFO org.apache.spark.deploy.master.Master Removing app app-20141022163332-0001 16:34:01.252 Uhr INFO org.apache.spark.deploy.master.Master akka.tcp://spark@remoteHost:50571 got disassociated, removing it. 16:34:01.251 Uhr INFO akka.actor.LocalActorRef Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4085.214.61.169%3A50579-4#693766985] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 16:34:01.254 Uhr INFO akka.actor.LocalActorRef Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4085.214.61.169%3A50579-4#693766985] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 16:34:02.652 Uhr INFO org.apache.spark.deploy.master.Master akka.tcp://spark@remoteHost:50546 got disassociated, removing it. 16:34:02.654 Uhr ERROR akka.remote.EndpointWriter AssociationError [akka.tcp://sparkMaster@hostnameMaster:7077] -> [akka.tcp://spark@remoteHost:50546]: Error [Association failed with [akka.tcp://spark@remoteHost:50546]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@remoteHost:50546] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection timed out: remoteHost/85.214.61.169:50546 ]
Worker-Log
Launch command: "/usr/lib/jvm/java-7-oracle-cloudera/bin/java" ......." 16:35:20.254 Uhr INFO org.apache.spark.deploy.worker.Worker Executor app-20141022163413-0002/0 finished with state EXITED message Command exited with code 1 exitStatus 1 16:35:20.267 Uhr INFO org.apache.spark.deploy.worker.Worker Asked to launch executor app-20141022163413-0002/1 for TalendSpark_tSparkConnection_1 16:35:20.289 Uhr INFO akka.actor.LocalActorRef Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4085.214.228.246%3A38745-2#-260186988] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 16:35:20.313 Uhr ERROR akka.remote.EndpointWriter AssociationError [akka.tcp://sparkWorker@workerHost:7078] -> [akka.tcp://sparkExecutor@workerHost:57158]: Error [Association failed with [akka.tcp://sparkExecutor@workerHost:57158]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@workerHost:57158] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: workerHost/xx.xx.xx:57158 ]
Created 10-22-2014 07:56 AM
Also a Warning Message in Talend or from the console while using spark-submit is comming up:
org.apache.spark.scheduler.TaskSchedulerImpl - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
But 2 cores and 512mb just for reading and saving are enough i guess.
Created 10-22-2014 08:03 AM
That could suggest that the amount of resource that is available to your Spark jobs is not big enough to accommodate the resources that Talend or your app are requesting. I don't know whether you mean only 2 cores are available or 2 are requested, but the question is whether the request exceeds what's available. I'd check on this aspect. For example if running via YARN, see how much resource YARN can allocate and look at your logs to see what Spark thinks it's asking for.
Created 10-22-2014 09:12 AM
Hello, thanks for your effort.
Spark running in Standalone Mode. But I checked the resources request in the web-ui.
I have now 2 workers and one master. The workes provide 2 cores and 4GB. Requested with this job are 2 Cores and 512MB. If I added cores to the workes node, the number of cores that are requested increase to the available ones. But nothing changed. The Error message is still the same.
Created 10-23-2014 06:23 AM
Hi,
I have some new interessting information.
Talend provides the option to export the hole job. As a result of this you can run a bash file which executes the java code - nothing new so far, but if I run the job this way directly on the cloudera cluster - spark runs without errors and i received the expected result.
I checked the communication from the remote host to cloudera and all related hosts via telnet. The connection can be established over port 7077 - spark master and 7078 spark - worker. I have no idea how to solve this problem. I would be glad if somone has any further hints.
Thanks
Robert
Created 10-27-2014 08:08 AM
For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.