Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Trying to build a new HA cluster setup

avatar
Explorer

Hi Team,

I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong.

This is what it shows up when I tried to start zkfc controller after starting journalnode controllers.

17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278
17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException: master1
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
at java.net.InetAddress.getAllByName(InetAddress.java:1185)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
root@master1:~#

Thanks

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

avatar
Explorer

Hi Jay,

Thanks for the reply.

I replaced the hostname with FQDN domain and ran the same command. It worked successfully. However, ran into another problem. After formatting zkfc, ran name -format command and landed in another problem.

`````

17/08/25 05:43:09 INFO common.Storage: Storage directory /home/kasim/journal/tmp/dfs/name has been successfully formatted.
17/08/25 05:43:09 WARN namenode.NameNode: Encountered exception during format:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 INFO util.ExitUtil: Exiting with status 1
17/08/25 05:43:09 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at odc-c-01.prc.eucalyptus-systems.com/10.104.10.1
************************************************************/

````

I checked the folder structure, it already got created.

```

/home/kasim/dfs/jn/ha-cluster/current

[root@odc-c-01 name]# cd current/
[root@odc-c-01 current]# ls
seen_txid VERSION
[root@odc-c-01 current]# pwd
/home/kasim/journal/tmp/dfs/name/current
[root@odc-c-01 current]#

````

Thanks,

avatar
Master Mentor

@Kasim Shaik

The error is :

WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current

.

- Please check the permission on the directory, The user who is running the NameNode format should be able to write to that directory.

# ls -ld  /home/kasim/dfs/
# ls -ld  /home/kasim/dfs/jn
# ls -ld  /home/kasim/dfs/jn/ha-cluster
# ls -ld  /home/kasim/dfs/jn/ha-cluster/current
# ls -lart  /home/kasim/dfs/jn/ha-cluster/current

.

avatar
Explorer

@Jay SenSharma

The folder is created on a filer. I am running as root user. User "root" has all privileges on that folder.

{code}

[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/
drwxr-xr-x 3 nobody nobody 4096 Aug 23 03:31 /home/kasim/dfs/
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:43 /home/kasim/dfs/jn
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster/current
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster/current
[root@odc-c-01 kasim]# ls -lart /home/kasim/dfs/jn/ha-cluster/current
total 16
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 ..
-rwxr-xr-x 1 nobody nobody 154 Aug 25 05:59 VERSION
drwxr-xr-x 2 nobody nobody 4096 Aug 25 05:59 paxos
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 .
[root@odc-c-01 kasim]#

{code}

avatar
Master Mentor

@Kasim Shaik

Do you know how to use blueprints ? I could hep you on that to deploy without any fuss!

avatar
Explorer

Yes, Please.

avatar
Master Mentor

@Kasim Shaik

Can you tell me the number of master node and datanodes or edge nodes you want in your cluster?

avatar
Explorer

I have a total of 6 machines in my set up. One for active node, one for stand by node, one for resources manager and remaining 3 machines for data nodes. My question is dfs.journalnode.edits.dir location should ba remote shared directory or it can be on local filesystem with uniform directory structure across all journal nodes.

avatar
Master Mentor

@Kasim Shaik

With 6 machines you could have

2 master nodes for HDFS HA
1 edge node with clients/Ambari server
3 data nodes

What version of HDP? Will you use Mysql for hive/ranger/oozie ?

Is that fine for you