Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Trying to build a new HA cluster setup

avatar
New Member

Hi Team,

I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong.

This is what it shows up when I tried to start zkfc controller after starting journalnode controllers.

17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278
17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException: master1
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
at java.net.InetAddress.getAllByName(InetAddress.java:1185)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
root@master1:~#

Thanks

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Kasim Shaik

In the cluster_config.json change the following

Stack_version to match your version "2.x"

In the hostmap.json change the masterx,datanodex or ambari-server to match FQDN of the machines.

Make sure you have internal repos to match the entries in repo.json and dputil-repo.jso

Cli.txt change the "ambari-server" to match your FQDN of your Ambai server and launch them in that order

Remeber to rename the *.json.txt to *.json as HCC doesn't accept .json file type upload

avatar
New Member

Hi @Jay SenSharma,

Finally, I was able to configure HA cluster successfully. Fail-over is happening when I tried to do the same using "hdfs haadmin -failover" command. However, I noticed, fsimage & edit log files only in one server.

[root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-01
standby
[root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-16
active
[root@odc-c-01 current]#

<property>
<name>hadoop.tmp.dir</name>
<value>/shared/kasim/journal/tmp</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/shared/kasim/dfs/jn</value>
</property>
</configuration>

[root@odc-c-01 current]# ls fsimage_0000000000000003698
fsimage_0000000000000003698
[root@odc-c-01 current]#
[root@odc-c-01 current]# pwd
/shared/kasim/journal/tmp/dfs/name/current
[root@odc-c-01 current]#

Still I do not understand why it is writing fsimage & edits log information to only one server and in a different directory which I have not mention for "dfs.journalnode.edits.dir". Could you shed some light over on that part.

Thanks,

avatar
Master Mentor

@Kasim Shaik

Can you check paste the screenshot of the below directories

Ambari UI-->HDFS-->Configs-->NammeNode directories

If you have ONLY one directory path then that explains why you have only one copy

avatar
New Member

@Geoffrey Shelton Okot

I have had used apache hadoop tar file to configure HA, not amabari GUI

Thanks,

avatar
Master Mentor

@Kasim Shaik

It doesn't matter whether you used tarball and blueprint which I sent you. After the installation how are you managing your cluster ? I guess by Ambari not so ?

Just check how many directories are in Ambari UI-->HDFS-->Configs-->NameNode directories