Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Trying to build a new HA cluster setup

avatar
Explorer

Hi Team,

I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong.

This is what it shows up when I tried to start zkfc controller after starting journalnode controllers.

17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278
17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException: master1
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
at java.net.InetAddress.getAllByName(InetAddress.java:1185)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
root@master1:~#

Thanks

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Kasim Shaik

In the cluster_config.json change the following

Stack_version to match your version "2.x"

In the hostmap.json change the masterx,datanodex or ambari-server to match FQDN of the machines.

Make sure you have internal repos to match the entries in repo.json and dputil-repo.jso

Cli.txt change the "ambari-server" to match your FQDN of your Ambai server and launch them in that order

Remeber to rename the *.json.txt to *.json as HCC doesn't accept .json file type upload

avatar
Explorer

Hi @Jay SenSharma,

Finally, I was able to configure HA cluster successfully. Fail-over is happening when I tried to do the same using "hdfs haadmin -failover" command. However, I noticed, fsimage & edit log files only in one server.

[root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-01
standby
[root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-16
active
[root@odc-c-01 current]#

<property>
<name>hadoop.tmp.dir</name>
<value>/shared/kasim/journal/tmp</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/shared/kasim/dfs/jn</value>
</property>
</configuration>

[root@odc-c-01 current]# ls fsimage_0000000000000003698
fsimage_0000000000000003698
[root@odc-c-01 current]#
[root@odc-c-01 current]# pwd
/shared/kasim/journal/tmp/dfs/name/current
[root@odc-c-01 current]#

Still I do not understand why it is writing fsimage & edits log information to only one server and in a different directory which I have not mention for "dfs.journalnode.edits.dir". Could you shed some light over on that part.

Thanks,

avatar
Master Mentor

@Kasim Shaik

Can you check paste the screenshot of the below directories

Ambari UI-->HDFS-->Configs-->NammeNode directories

If you have ONLY one directory path then that explains why you have only one copy

avatar
Explorer

@Geoffrey Shelton Okot

I have had used apache hadoop tar file to configure HA, not amabari GUI

Thanks,

avatar
Master Mentor

@Kasim Shaik

It doesn't matter whether you used tarball and blueprint which I sent you. After the installation how are you managing your cluster ? I guess by Ambari not so ?

Just check how many directories are in Ambari UI-->HDFS-->Configs-->NameNode directories