Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark in YARN with Namenode HA

avatar
Contributor

Hi HWX,

I do have HDP2.3.4 with Namenode HA. I submitted Spark jobs properly until one Namenode went down. Since then, no more Spark jobs are starting.

It looks like the HDFS client is not falling back properly to the second Namenode properly:

$ hdfs dfs -ls /tmp
... working fine...
$ spark-shell --master yarn-master
...snip...
16/01/12 22:57:53 INFO ui.SparkUI: Started SparkUI at http://10.10.10.3:7884
16/01/12 22:57:53 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/01/12 22:57:54 INFO impl.TimelineClientImpl: Timeline service address: http://daplab-wn-12.fri.lan:8188/ws/v1/timeline/
16/01/12 22:53:16 INFO ipc.Client: Retrying connect to server: daplab-rt-11.fri.lan/10.10.10.111:8020. Already tried 0 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[500x2000ms], TryOnceThenFail]
16/01/12 22:53:20 INFO ipc.Client: Retrying connect to server: daplab-rt-11.fri.lan/10.10.10.111:8020. Already tried 1 time(s); retry policy is RetryPolicy[MultipleLinearRandomRetry[500x2000ms], TryOnceThenFail]

.. and so on until I lost my patience...

If I'm changing the ip address in /etc/hosts to point 10.10.10.111 to the active namenode, then it is moving forward.

As I said it's a fresh HDP 2.3.4 install, without anything fancy.

Thanks

Benoit

1 ACCEPTED SOLUTION

avatar

java.net.NoRouteToHostException is considered a failure that can be recovered from in any deployment with floating IP addresses. This was essentially the sole form of failover in Hadoop pre NN-HA (HADOOP-6667 added the check). I think we ought to revisit that decision

View solution in original post

10 REPLIES 10

avatar
Contributor

The temporary solution is to fake hostname of the failing Namenode in /etc/revolv.conf (or equivalent) and make it point to the IP of the healthy Namenode, until your Namenode is back to life.