Support Questions

Find answers, ask questions, and share your expertise

Ambari HA Blueprint: HOSTGROUP syntax not getting resolved to actual host names

avatar
Contributor

I'm using Ambari 2.1.2 to install a Highly Available HDP 2.3.4 cluster. The service installation is successful on the nodes, but the services fail to start. Digging in to the logs and the config files, I found that the %HOSTNAME::node:port% strings didn't get replaced with the actual hostnames defined in the cluster configuration template. As a result, the config files contain invalid URIs like these:

    <property>
      <name>dfs.namenode.http-address</name>
      <value>%HOSTGROUP::master_2%:50070</value>
      <final>true</final>
    </property>


    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>%HOSTGROUP::master_2%:50070</value>
    </property>

Sure enough, the errors while starting the services also pointed to the same reason:

[root@worker1 azureuser]# cat /var/log/hadoop/hdfs/hadoop-hdfs-datanode-worker1.log
2016-01-05 02:24:22,601 INFO  datanode.DataNode (LogAdapter.java:info(45)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = worker1.012g3iyhe01upgbu35npgl5l4a.gx.internal.cloudapp.net/10.0.0.9
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.1.2.3.4.0-3485
.
.
.

2016-01-05 02:34:27,068 FATAL datanode.DataNode (DataNode.java:secureMain(2533)) - Exception in secureMain
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: %HOSTGROUP::master_2%:8020
	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
	at org.apache.hadoop.hdfs.DFSUtil.getAddressesForNameserviceId(DFSUtil.java:687)
	at org.apache.hadoop.hdfs.DFSUtil.getAddressesForNsIds(DFSUtil.java:655)
	at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:872)
	at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1152)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:430)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2411)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2345)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2526)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2550)
2016-01-05 02:34:27,072 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2016-01-05 02:34:27,076 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at worker1.012g3iyhe01upgbu35npgl5l4a.gx.internal.cloudapp.net/10.0.0.9
************************************************************/

Interestingly, the Ambari server log reports that the hostname mapping was successful for master nodes, but I didn't find it for worker nodes.

05 Jan 2016 02:13:40,271  INFO [pool-2-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = master_5 has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
05 Jan 2016 02:13:40,272  INFO [pool-2-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = master_1 has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
05 Jan 2016 02:13:40,273  INFO [pool-2-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = master_2 has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
05 Jan 2016 02:13:40,273  INFO [pool-2-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = master_3 has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
05 Jan 2016 02:13:40,274  INFO [pool-2-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = master_4 has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
 

(but even the master nodes had service startup failure)

Here's the config Blueprint Gist: https://gist.github.com/DhruvKumar/355af66897e584b...

And here's the cluster creation template: https://gist.github.com/DhruvKumar/9b971be81389317...

Here's the result of blueprint exported from Ambari server after installation (using /api/v1/clusters/clusterName?format=blueprint): https://gist.github.com/DhruvKumar/373cd7b05ca818c...

Edit: Ambari Server Log: https://gist.github.com/DhruvKumar/e2c06a94388c51e...

Note that my non-HA Blueprint which doesn't contain the HOSTNAME syntax works without an issue on the same infrastructure.

Can someone please help me debug why the hostnames aren't being mapped correctly? Is it a problem in the HA Blueprint? I have all the logs from the installation and I'll keep the cluster alive for debugging.

Thanks.

1 ACCEPTED SOLUTION

avatar
Explorer

@Dhruv Kumar

It looks like you have oozie HA configured incorrectly.

From the ambari server log:

05 Jan 2016 02:13:40,471 ERROR [pool-2-thread-1] TopologyManager:553 - TopologyManager.ConfigureClusterTask: An exception occurred while attempting to process cluster configs and set on cluster: java.lang.IllegalArgumentException: Unable to update configuration property 'oozie.base.url' with topology information. Component 'OOZIE_SERVER' is mapped to an invalid number of hosts '2'.

Looking at the config processor code, it determines whether oozie HA is enabled by looking at the property

oozie-site/oozie.services.ext 

To enable Oozie HA the property must be specified and must contain the following in it's value

org.apache.oozie.service.ZKLocksService

Because HA isn't properly configured, configuration processing fails because for non-HA environments, OOZIE_SERVER can only be mapped to a single host.

For Ambari Blueprint HA support please refer to:

Ambari Blueprint HA support

View solution in original post

15 REPLIES 15

avatar
Explorer

@Dhruv Kumar

It looks like you have oozie HA configured incorrectly.

From the ambari server log:

05 Jan 2016 02:13:40,471 ERROR [pool-2-thread-1] TopologyManager:553 - TopologyManager.ConfigureClusterTask: An exception occurred while attempting to process cluster configs and set on cluster: java.lang.IllegalArgumentException: Unable to update configuration property 'oozie.base.url' with topology information. Component 'OOZIE_SERVER' is mapped to an invalid number of hosts '2'.

Looking at the config processor code, it determines whether oozie HA is enabled by looking at the property

oozie-site/oozie.services.ext 

To enable Oozie HA the property must be specified and must contain the following in it's value

org.apache.oozie.service.ZKLocksService

Because HA isn't properly configured, configuration processing fails because for non-HA environments, OOZIE_SERVER can only be mapped to a single host.

For Ambari Blueprint HA support please refer to:

Ambari Blueprint HA support

avatar
Contributor

Ah, thanks. Let me try it and see if it works.

avatar
Expert Contributor

The comment from jspeidel is correct.

If Oozie HA is being used in this Blueprint, then "oozie.base.url" must be set explicitly to the address of the loadbalancer being used, since Oozie HA requires a separate loadbalancer that is external to each Oozie instance.

If you are just testing out your Blueprint, then you can just set this property to be the address of one or the other Oozie instances in your cluster.

Here's a great reference on Oozie HA, that will be helpful in setting up an Oozie HA Blueprint:

https://oozie.apache.org/docs/4.1.0/AG_Install.html#High_Availability_HA

avatar
Contributor
@rnettleton

Yes, John's solution worked. Thanks a lot for your help!

avatar
Contributor

@jspeidel

Thanks John, your solution worked. I was indeed missing the Oozie HA property.

avatar
Explorer

I don't have Oozie HA and I've got the problem also. All hostgroups are failing to be substituted. I'm using an external Postgresql 9.2 database, are their known issues with this?