- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Trying to build a new HA cluster setup
- Labels:
-
Apache Hadoop
Created ‎08-25-2017 11:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong.
This is what it shows up when I tried to start zkfc controller after starting journalnode controllers.
17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278
17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException: master1
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
at java.net.InetAddress.getAllByName(InetAddress.java:1185)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
root@master1:~#
Thanks
Created ‎08-25-2017 12:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following error indicates that you might not have configured the FQDN properly in your cluster.
java.net.UnknownHostException: master1
Can you please check if the "hostname -f" command actually returns you the same desired FQDN?
Example:
root@master1:~# hostname -f
.
Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.
Created ‎08-25-2017 12:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following error indicates that you might not have configured the FQDN properly in your cluster.
java.net.UnknownHostException: master1
Can you please check if the "hostname -f" command actually returns you the same desired FQDN?
Example:
root@master1:~# hostname -f
.
Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.
Created ‎08-25-2017 12:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jay,
Thanks for the reply.
I replaced the hostname with FQDN domain and ran the same command. It worked successfully. However, ran into another problem. After formatting zkfc, ran name -format command and landed in another problem.
`````
17/08/25 05:43:09 INFO common.Storage: Storage directory /home/kasim/journal/tmp/dfs/name has been successfully formatted.
17/08/25 05:43:09 WARN namenode.NameNode: Encountered exception during format:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 INFO util.ExitUtil: Exiting with status 1
17/08/25 05:43:09 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at odc-c-01.prc.eucalyptus-systems.com/10.104.10.1
************************************************************/
````
I checked the folder structure, it already got created.
```
/home/kasim/dfs/jn/ha-cluster/current
[root@odc-c-01 name]# cd current/
[root@odc-c-01 current]# ls
seen_txid VERSION
[root@odc-c-01 current]# pwd
/home/kasim/journal/tmp/dfs/name/current
[root@odc-c-01 current]#
````
Thanks,
Created ‎08-25-2017 12:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error is :
WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
.
- Please check the permission on the directory, The user who is running the NameNode format should be able to write to that directory.
# ls -ld /home/kasim/dfs/ # ls -ld /home/kasim/dfs/jn # ls -ld /home/kasim/dfs/jn/ha-cluster # ls -ld /home/kasim/dfs/jn/ha-cluster/current # ls -lart /home/kasim/dfs/jn/ha-cluster/current
.
Created ‎08-25-2017 01:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The folder is created on a filer. I am running as root user. User "root" has all privileges on that folder.
{code}
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/
drwxr-xr-x 3 nobody nobody 4096 Aug 23 03:31 /home/kasim/dfs/
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:43 /home/kasim/dfs/jn
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster/current
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster/current
[root@odc-c-01 kasim]# ls -lart /home/kasim/dfs/jn/ha-cluster/current
total 16
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 ..
-rwxr-xr-x 1 nobody nobody 154 Aug 25 05:59 VERSION
drwxr-xr-x 2 nobody nobody 4096 Aug 25 05:59 paxos
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 .
[root@odc-c-01 kasim]#
{code}
Created ‎08-25-2017 01:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you know how to use blueprints ? I could hep you on that to deploy without any fuss!
Created ‎08-25-2017 01:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, Please.
Created ‎08-25-2017 02:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you tell me the number of master node and datanodes or edge nodes you want in your cluster?
Created ‎08-25-2017 02:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a total of 6 machines in my set up. One for active node, one for stand by node, one for resources manager and remaining 3 machines for data nodes. My question is dfs.journalnode.edits.dir location should ba remote shared directory or it can be on local filesystem with uniform directory structure across all journal nodes.
Created ‎08-25-2017 02:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With 6 machines you could have
2 master nodes for HDFS HA 1 edge node with clients/Ambari server 3 data nodes
What version of HDP? Will you use Mysql for hive/ranger/oozie ?
Is that fine for you
