Member since
08-25-2017
7
Posts
0
Kudos Received
0
Solutions
08-28-2017
08:15 AM
@Geoffrey Shelton Okot I have had used apache hadoop tar file to configure HA, not amabari GUI Thanks,
... View more
08-28-2017
08:03 AM
Hi @Jay SenSharma, Finally, I was able to configure HA cluster successfully. Fail-over is happening when I tried to do the same using "hdfs haadmin -failover" command. However, I noticed, fsimage & edit log files only in one server. [root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-01 standby [root@odc-c-01 current]# hdfs haadmin -getServiceState odc-c-16 active [root@odc-c-01 current]# <property> <name>hadoop.tmp.dir</name> <value>/shared/kasim/journal/tmp</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/shared/kasim/dfs/jn</value> </property> </configuration> [root@odc-c-01 current]# ls fsimage_0000000000000003698 fsimage_0000000000000003698 [root@odc-c-01 current]# [root@odc-c-01 current]# pwd /shared/kasim/journal/tmp/dfs/name/current [root@odc-c-01 current]# Still I do not understand why it is writing fsimage & edits log information to only one server and in a different directory which I have not mention for "dfs.journalnode.edits.dir". Could you shed some light over on that part. Thanks,
... View more
08-25-2017
02:17 PM
I have a total of 6 machines in my set up. One for active node, one for stand by node, one for resources manager and remaining 3 machines for data nodes. My question is dfs.journalnode.edits.dir location should ba remote shared directory or it can be on local filesystem with uniform directory structure across all journal nodes.
... View more
08-25-2017
01:03 PM
@Jay SenSharma The folder is created on a filer. I am running as root user. User "root" has all privileges on that folder. {code} [root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/ drwxr-xr-x 3 nobody nobody 4096 Aug 23 03:31 /home/kasim/dfs/ [root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:43 /home/kasim/dfs/jn [root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster [root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster/current drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster/current [root@odc-c-01 kasim]# ls -lart /home/kasim/dfs/jn/ha-cluster/current total 16 drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 .. -rwxr-xr-x 1 nobody nobody 154 Aug 25 05:59 VERSION drwxr-xr-x 2 nobody nobody 4096 Aug 25 05:59 paxos drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 . [root@odc-c-01 kasim]# {code}
... View more
08-25-2017
12:51 PM
Hi Jay, Thanks for the reply. I replaced the hostname with FQDN domain and ran the same command. It worked successfully. However, ran into another problem. After formatting zkfc, ran name -format command and landed in another problem. ````` 17/08/25 05:43:09 INFO common.Storage: Storage directory /home/kasim/journal/tmp/dfs/name has been successfully formatted. 17/08/25 05:43:09 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190) at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) 17/08/25 05:43:09 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown: 10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190) at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) 17/08/25 05:43:09 INFO util.ExitUtil: Exiting with status 1 17/08/25 05:43:09 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at odc-c-01.prc.eucalyptus-systems.com/10.104.10.1 ************************************************************/ ```` I checked the folder structure, it already got created. ``` /home/kasim/dfs/jn/ha-cluster/current [root@odc-c-01 name]# cd current/ [root@odc-c-01 current]# ls seen_txid VERSION [root@odc-c-01 current]# pwd /home/kasim/journal/tmp/dfs/name/current [root@odc-c-01 current]# ```` Thanks,
... View more
08-25-2017
11:59 AM
Hi Team, I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong. This is what it shows up when I tried to start zkfc controller after starting journalnode controllers. 17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278 17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now java.net.UnknownHostException: master1 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316) at java.net.InetAddress.getAllByName0(InetAddress.java:1269) at java.net.InetAddress.getAllByName(InetAddress.java:1185) at java.net.InetAddress.getAllByName(InetAddress.java:1119) at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380) at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628) at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767) at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227) at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350) at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181) root@master1:~# Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop