Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Query the Active Name Node from an Oozie Job

avatar
Expert Contributor

I am using Cloudera 5.4.1. I have authored an oozie workflow. in one of the tasks, I need to use Oozie EL to generate a string which contains the URL of the "active" name node.

 

My cloudera cluster has active and secondary name nodes. At any point of time, I need to be able to dynamically query which is the active name node.

 

I have been looking around the internet but have found no answer. This is what I have thought

 

1. Use the SSH Oozie task to execute `hdfs getconfig -namenode` but the problem is that this command returns everything and not just the name of the active name node.

 

2. Use Cloudera Manager REST API to query the active name node. I have not seen any documentation or example which would allow me to do this.

 

If possible, please be kind enough to let me know how can I query and dynamically determine the active name node of a hadoop cluster

1 ACCEPTED SOLUTION

avatar
Mentor
The role level APIs carry the state, but you're querying the service level.
Use the role IDs from the service level to then query the roles directly.

View solution in original post

5 REPLIES 5

avatar
Mentor

You can use a script such as below to grab active/standby states:

# Grab the IDs from the nameservice key 
NNS=$(hdfs getconf -confKey dfs.ha.namenodes.nameservice1) 
# Convert to a proper bash array, delimiting on comma 
NNSA=(${NNS/,/ }) 
# Lookup state for both 
STATEA=$(sudo -u hdfs hdfs haadmin -getServiceState ${NNSA[0]}) 
STATEB=$(sudo -u hdfs hdfs haadmin -getServiceState ${NNSA[1]}) 
# Test for active state 
if [ "$STATEA" == "active" ] 
then 
ANN=${NNSA[0]} 
else 
ANN=${NNSA[1]} 
fi 
# Print the host and port of NN that is active 
ANNHOSTPORT=$(hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.$ANN)
echo $ANNHOSTPORT

This (haadmin commands) may however require admin privileges, i.e. the 'hdfs' user or a member of the configured supergroup, to run.

 

Are you running some non-Java based WebHDFS REST queries that require knowledge of the active NameNode? Just looking to understand why knowing the active seems necessary for your operation.

avatar
Expert Contributor

The reason why this solution will not work for me is because haadmin requires super user (HDFS Admin) permission to run.

This is absolutely impossible for a remote machine to provide.

avatar
Mentor
Yes that is true, and I did note as much.

If you are writing an EL function in Java, you can also rely on the
StandbyException when connecting to explicit hostnames, to detect the
active (i.e. one that does not return the exception). This is the way the
Java client discovers the active NN.

The CM API method can work if you query the role instances directly:
http://cloudera.github.io/cm_api/apidocs/v9/path__clusters_-clusterName-_services_-serviceName-_role...
The returned apiRole object, on applicable roles such as NameNode or
ResourceManager, has a flag for "haStatus" (HA status) as described at
http://cloudera.github.io/cm_api/apidocs/v9/ns0_apiRole.html (and
http://cloudera.github.io/cm_api/apidocs/v9/ns0_haStatus.html)

avatar
Expert Contributor

OK. The cloudera manager API solution looks good

 

I tried

 

http://cloudera.mycompany.com/api/v9/clusters/cluster2/services/hdfs1/roles

 

The output is comprehensive... but it doesn't have the machine name anywhere.

 

{
    "name" : "hdfs1-NAMENODE-f307e0a1ebd0702da50cbfb68356cadf",
    "type" : "NAMENODE",
    "serviceRef" : {
      "clusterName" : "cluster2",
      "serviceName" : "hdfs1"
    },
    "hostRef" : {
      "hostId" : "01008a9e-8b44-4e73-bc29-510fbe00b632"
    },
    "roleUrl" : "http://clouderamgr01.mycompany.com:7180/cmf/roleRedirect/hdfs1-NAMENODE-f307e0a1ebd0702da50cbfb68356cadf",
    "roleState" : "STARTED",
    "healthSummary" : "GOOD",
    "healthChecks" : [ {
      "name" : "NAME_NODE_DATA_DIRECTORIES_FREE_SPACE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_DIRECTORY_FAILURES",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_FILE_DESCRIPTOR",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_HA_CHECKPOINT_AGE",
      "summary" : "DISABLED"
    }, {
      "name" : "NAME_NODE_HEAP_DUMP_DIRECTORY_FREE_SPACE",
      "summary" : "DISABLED"
    }, {
      "name" : "NAME_NODE_HOST_HEALTH",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_JOURNAL_NODE_SYNC_STATUS",
      "summary" : "DISABLED"
    }, {
      "name" : "NAME_NODE_LOG_DIRECTORY_FREE_SPACE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_PAUSE_DURATION",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_ROLLING_UPGRADE_STATUS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_RPC_LATENCY",
      "summary" : "DISABLED"
    }, {
      "name" : "NAME_NODE_SAFE_MODE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_SCM_HEALTH",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_SWAP_MEMORY_USAGE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_UNEXPECTED_EXITS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_UPGRADE_STATUS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_WEB_METRIC_COLLECTION",
      "summary" : "GOOD"
    } ],
    "configStalenessStatus" : "FRESH",
    "haStatus" : "STANDBY",
    "maintenanceMode" : false,
    "maintenanceOwners" : [ ],
    "commissionState" : "COMMISSIONED",
    "roleConfigGroupRef" : {
      "roleConfigGroupName" : "hdfs1-NAMENODE-BASE"
    }

 

and 

 

{
    "name" : "hdfs1-NAMENODE-e8e705638eaaa8233bd2729af511f874",
    "type" : "NAMENODE",
    "serviceRef" : {
      "clusterName" : "cluster2",
      "serviceName" : "hdfs1"
    },
    "hostRef" : {
      "hostId" : "4127ba89-71a8-4a27-af75-51900f9f0a2e"
    },
    "roleUrl" : "http://clouderamgr01.mycompany.com:7180/cmf/roleRedirect/hdfs1-NAMENODE-e8e705638eaaa8233bd2729af511f874",
    "roleState" : "STARTED",
    "healthSummary" : "GOOD",
    "healthChecks" : [ {
      "name" : "NAME_NODE_DATA_DIRECTORIES_FREE_SPACE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_DIRECTORY_FAILURES",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_FILE_DESCRIPTOR",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_HA_CHECKPOINT_AGE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_HEAP_DUMP_DIRECTORY_FREE_SPACE",
      "summary" : "DISABLED"
    }, {
      "name" : "NAME_NODE_HOST_HEALTH",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_JOURNAL_NODE_SYNC_STATUS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_LOG_DIRECTORY_FREE_SPACE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_PAUSE_DURATION",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_ROLLING_UPGRADE_STATUS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_RPC_LATENCY",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_SAFE_MODE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_SCM_HEALTH",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_SWAP_MEMORY_USAGE",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_UNEXPECTED_EXITS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_UPGRADE_STATUS",
      "summary" : "GOOD"
    }, {
      "name" : "NAME_NODE_WEB_METRIC_COLLECTION",
      "summary" : "GOOD"
    } ],
    "configStalenessStatus" : "FRESH",
    "haStatus" : "ACTIVE",
    "maintenanceMode" : false,
    "maintenanceOwners" : [ ],
    "commissionState" : "COMMISSIONED",
    "roleConfigGroupRef" : {
      "roleConfigGroupName" : "hdfs1-NAMENODE-BASE"
    }

The following is the screenshot from the cloudera manager UI. which clearly shows 02 as standby

 

Screen Shot 2015-12-09 at 9.23.06 AM.png

 

If I take the hostId provided by the Json above and then do a 

 

http://cloudera.mycompany.com/api/v9/clusters/cluster2/hosts/01008a9e-8b44-4e73-bc29-510fbe00b632

 

but this only gives me a blank page. So I cannot get machine name.

avatar
Mentor
The role level APIs carry the state, but you're querying the service level.
Use the role IDs from the service level to then query the roles directly.