Support Questions

kasim123 · ‎08-25-2017

Hi Team,

I have been trying to build a ha cluster set up, but could not set it up properly. Every it fails while starting zkfc service. Not sure where went wrong.

This is what it shows up when I tried to start zkfc controller after starting journalnode controllers.

17/08/25 04:48:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString= master1:2181,master2:2181:slave1:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@1417e278
17/08/25 04:48:51 FATAL tools.DFSZKFailoverController: Got a fatal error, exiting now
java.net.UnknownHostException: master1
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316)
at java.net.InetAddress.getAllByName0(InetAddress.java:1269)
at java.net.InetAddress.getAllByName(InetAddress.java:1185)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:628)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:350)
at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:191)
at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
root@master1:~#

Thanks

jsensharma · ‎08-25-2017

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

View solution in original post

jsensharma · ‎08-25-2017

@Kasim Shaik

The following error indicates that you might not have configured the FQDN properly in your cluster.

java.net.UnknownHostException: master1

Can you please check if the "hostname -f" command actually returns you the same desired FQDN?

Example:

root@master1:~#    hostname -f

.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-installation-ppc/content/set_the_...

Every node of your cluster should be able to resolve the nodes properly with the FQDN correctly.

kasim123 · ‎08-25-2017

Hi Jay,

Thanks for the reply.

I replaced the hostname with FQDN domain and ran the same command. It worked successfully. However, ran into another problem. After formatting zkfc, ran name -format command and landed in another problem.

`````

17/08/25 05:43:09 INFO common.Storage: Storage directory /home/kasim/journal/tmp/dfs/name has been successfully formatted.
17/08/25 05:43:09 WARN namenode.NameNode: Encountered exception during format:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:337)
at org.apache.hadoop.hdfs.qjournal.server.JNStorage.format(JNStorage.java:190)
at org.apache.hadoop.hdfs.qjournal.server.Journal.format(Journal.java:217)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.format(JournalNodeRpcServer.java:141)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.format(QJournalProtocolServerSideTranslatorPB.java:145)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25419)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.format(QuorumJournalManager.java:214)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.formatNonFileJournals(FSEditLog.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:992)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
17/08/25 05:43:09 INFO util.ExitUtil: Exiting with status 1
17/08/25 05:43:09 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at odc-c-01.prc.eucalyptus-systems.com/10.104.10.1
************************************************************/

````

I checked the folder structure, it already got created.

```

/home/kasim/dfs/jn/ha-cluster/current

[root@odc-c-01 name]# cd current/
[root@odc-c-01 current]# ls
seen_txid VERSION
[root@odc-c-01 current]# pwd
/home/kasim/journal/tmp/dfs/name/current
[root@odc-c-01 current]#

````

Thanks,

jsensharma · ‎08-25-2017

@Kasim Shaik

The error is :

WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not format one or more JournalNodes. 1 exceptions thrown:
10.104.10.16:8485: Cannot create directory /home/kasim/dfs/jn/ha-cluster/current

.

- Please check the permission on the directory, The user who is running the NameNode format should be able to write to that directory.

# ls -ld  /home/kasim/dfs/
# ls -ld  /home/kasim/dfs/jn
# ls -ld  /home/kasim/dfs/jn/ha-cluster
# ls -ld  /home/kasim/dfs/jn/ha-cluster/current
# ls -lart  /home/kasim/dfs/jn/ha-cluster/current

.

kasim123 · ‎08-25-2017

@Jay SenSharma

The folder is created on a filer. I am running as root user. User "root" has all privileges on that folder.

{code}

[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/
drwxr-xr-x 3 nobody nobody 4096 Aug 23 03:31 /home/kasim/dfs/
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:43 /home/kasim/dfs/jn
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster
[root@odc-c-01 kasim]# ls -ld /home/kasim/dfs/jn/ha-cluster/current
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 /home/kasim/dfs/jn/ha-cluster/current
[root@odc-c-01 kasim]# ls -lart /home/kasim/dfs/jn/ha-cluster/current
total 16
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 ..
-rwxr-xr-x 1 nobody nobody 154 Aug 25 05:59 VERSION
drwxr-xr-x 2 nobody nobody 4096 Aug 25 05:59 paxos
drwxr-xr-x 3 nobody nobody 4096 Aug 25 05:59 .
[root@odc-c-01 kasim]#

{code}

Shelton · ‎08-25-2017

@Kasim Shaik

Do you know how to use blueprints ? I could hep you on that to deploy without any fuss!

kasim123 · ‎08-25-2017

Yes, Please.

Shelton · ‎08-25-2017

@Kasim Shaik

Can you tell me the number of master node and datanodes or edge nodes you want in your cluster?

kasim123 · ‎08-25-2017

I have a total of 6 machines in my set up. One for active node, one for stand by node, one for resources manager and remaining 3 machines for data nodes. My question is dfs.journalnode.edits.dir location should ba remote shared directory or it can be on local filesystem with uniform directory structure across all journal nodes.

Shelton · ‎08-25-2017

@Kasim Shaik

With 6 machines you could have

2 master nodes for HDFS HA
1 edge node with clients/Ambari server
3 data nodes

What version of HDP? Will you use Mysql for hive/ranger/oozie ?

Is that fine for you

Cloudera Community

Support Questions

Trying to build a new HA cluster setup

Visualize Cluster and Service Allocation - Build a...

NiFi 1.0.0 - Unsecured cluster setup

Distcp between two HA cluster with Kerberos + HA (...

Building Basic Flows with Nipyapi

Static edge node setup before cluster deployed wit...

Build and use Parquet-tools to read parquet files

Build a cluster with custom principal names using ...

Automating Python/R Conda environment setup in CDP...

Phoenix JDBC Client Setup

Building a Custom Processor Using IntelliJ