About Harsh J

Harsh J · ‎12-08-2015

Yes that is true, and I did note as much. If you are writing an EL function in Java, you can also rely on the StandbyException when connecting to explicit hostnames, to detect the active (i.e. one that does not return the exception). This is the way the Java client discovers the active NN. The CM API method can work if you query the role instances directly: http://cloudera.github.io/cm_api/apidocs/v9/path__clusters_-clusterName-_services_-serviceName-_roles_-roleName-.html#GET. The returned apiRole object, on applicable roles such as NameNode or ResourceManager, has a flag for "haStatus" (HA status) as described at http://cloudera.github.io/cm_api/apidocs/v9/ns0_apiRole.html (and http://cloudera.github.io/cm_api/apidocs/v9/ns0_haStatus.html)

Harsh J · ‎12-07-2015

You can use a script such as below to grab active/standby states: # Grab the IDs from the nameservice key NNS=$(hdfs getconf -confKey dfs.ha.namenodes.nameservice1) # Convert to a proper bash array, delimiting on comma NNSA=(${NNS/,/ }) # Lookup state for both STATEA=$(sudo -u hdfs hdfs haadmin -getServiceState ${NNSA[0]}) STATEB=$(sudo -u hdfs hdfs haadmin -getServiceState ${NNSA[1]}) # Test for active state if [ "$STATEA" == "active" ] then ANN=${NNSA[0]} else ANN=${NNSA[1]} fi # Print the host and port of NN that is active ANNHOSTPORT=$(hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.$ANN) echo $ANNHOSTPORT This (haadmin commands) may however require admin privileges, i.e. the 'hdfs' user or a member of the configured supergroup, to run. Are you running some non-Java based WebHDFS REST queries that require knowledge of the active NameNode? Just looking to understand why knowing the active seems necessary for your operation.

Harsh J · ‎12-06-2015

I am not quite sure I follow what the problem is. Could you post the differing outputs or a screenshot thereof? This may not be the issue but note that printing the representation of the string in Python will not print out unicode characters (and instead print hexes).

Harsh J · ‎12-06-2015

If the JVM that's buffering in the local dir were to die of a SIGKILL or such forms of immediate interruption, then the cleanup procedures aren't taken care of. When running in MR mode, try setting the buffer directory to ./tmp (relative) such that it creates the files under the task's working directories and these can be deleted automatically when the TaskTracker/NodeManager cleans up the tasks' environment after its kill. Also, have you tried to use S3A (s3a://) instead? It may function better than the older S3 FS, and does not utilise a buffer directory. S3A is included in CDH5 for a while now.

Harsh J · ‎12-06-2015

> So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right? Yes, it can be always taken as something wrong in the Kerberos layer (not necessarily only KDC, could also be things such as bad enctypes in keytab, etc., but always Kerberos mechanism related) > I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously? Did a retry of the auth perhaps succeed? Its not normal for it to repeat the errors.

Harsh J · ‎12-06-2015

The HTTPFS port is 14000 (it serves a WebHDFS protocol also) but the regular non-gateway style WebHDFS serves from the NameNode's port of 50070 (or 50075 on a DN). Could you try the below two variants? 1. curl -i "http://quickstart.cloudera:50070/webhdfs/v1/user/user.name=cloudera&op=GETFILESTATUS" 2. curl -i "http://localhost:14000/webhdfs/v1/user/user.name=cloudera&op=GETFILESTATUS" Does either of these work? If not, please re-run with curl -v and post the output here.

Harsh J · ‎12-02-2015

Yes, the Mechanism level: sub-codes usually pertain to operations within the context of a KDC or local Kerberos work. The connection reset being a network error is therefore alluding to the Client->KDC connection being reset. The ZKs would auth to each other in secure mode, but the specific failure here is within just the auth layer (than the higher levels of ZK connectivity and responses).

Harsh J · ‎11-18-2015

Does this help? https://gist.github.com/QwertyManiac/a9d7b546388cea72937f

Harsh J · ‎11-16-2015

Your end point is incorrect - you're trying /jobs/ (which gives a list of WFs with high-level info) and not /job/WFID (gives a specific WF and all details). The latter is what you need. Do this: req = urllib2.Request('http://xx.xx.xxx.xx:11000/oozie/v1/job/0000096-151104073848042-oozie-hado-W') (Or use /jobs to iterate over the list of all WFs, calling /job/ID for each item's id field)

Harsh J · ‎11-16-2015

Could you share a sample request URL and output received? It seems to work OK for me, for ex. for my WF ID of "0000000-151116211358117-oozie-oozi-W": ~> curl -L 'http://localhost:11000/oozie/v2/job/0000000-151116211358117-oozie-oozi-W' > wf.json ~> python >>> import json >>> a = json.loads(open('wf.json').read()) >>> len(a['actions']) 2 >>> a['actions'][1]['name'] u'Shell' FWIW, Hue today uses the same API for its Oozie app dashboards, and it does fetch all actions properly too. How old is the targeted WF, and are you able to see the list of actions OK in the web UIs?

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Query the Active Name Node from an Oozie Job

Re: Query the Active Name Node from an Oozie Job

Re: a problem with the encoding in HBASE and pytho...

Re: disk space issue on local disk.. due to buffer...

Re: Zookeeper kerberos issue or quorum issue?

Re: Cannot connect to webhdfs

Re: Zookeeper kerberos issue or quorum issue?

Re: File concatenation (HDFS-222)

Re: Oozie Workflow: Get running action name

Re: Oozie Workflow: Get running action name