About Harsh J

Harsh J · ‎08-04-2015

Please never disable that check. Checkpoints are very essential for the HDFS operation, and you do not want to be in a position with checkpoints failing for a technical reason and you never getting notified on that. Instead, look at your Standby or Secondary NN to figure out what the error is, and/or seek help with that identified information.

Harsh J · ‎07-31-2015

The trace is relieving in the fact that it fails on "readTransactionIdFile" method, which tries to read the file called "seen_txid" inside the NN's current/ local directory. Please try moving this file out (to /tmp or elsewhere) and then restart NN. Are you perchance running any form of disk encryption software that may not be active yet? The corrupted data is weird - the file is supposed to have a simple number in it.

Harsh J · ‎07-30-2015

Could you share the full exception trace? It appears your fsimage, or one of your edit logs, has somehow gotten some corrupt data in it. If its the edit log, you can perhaps attempt to skip a few entries in the 'hdfs namenode -recover' startup mode; but if its the fsimage that is corrupt, then we'll need to rollback to an older copy and replay edit logs on top. The full exception trace can help tell which file is corrupt.

Harsh J · ‎07-29-2015

CPUs are considered equally, if the request seeks that.

Harsh J · ‎07-28-2015

Yes, do you not see it working? You'll need to pass the XML property via the workflow.xml under the action's configuration section.

Harsh J · ‎07-28-2015

Thanks, that'd explain your transition. What application is this? Is it an MR2 application, Spark app, or something custom?

Harsh J · ‎07-28-2015

Recovery features deal with restarts of the service (RM or NM). An AM attempt is a separate feature that, like container retries in MR, is a regular runtime feature. Do you see your application ID attempt multiple AMs in the RM UI page for it? Do the RM logs indicate any form of kill or fail for the first 'appattempt' of the AM ID?

Harsh J · ‎07-27-2015

There will not be a performance difference in the IO paths, as you will still be using the same HFiles, and they will behave as the same table.

Harsh J · ‎07-27-2015

Glad to hear; Please consider marking the topic as resolved so others with similar issues can find it easily.

Harsh J · ‎07-27-2015

Unlike your spark shell command, Oozie does not invoke/use scripts that setup local classpaths for its actions (as it needs to use distributed-caches for this). Take a look at how the ShareLib works, and how you can override them for your action to include a system one http://archive.cloudera.com/cdh5/cdh/5/oozie/WorkflowFunctionalSpec.html#a17_HDFS_Share_Libraries_for_Workflow_Applications_since_Oozie_2.3. In your case, if you use the java action, you can make it include the "hive" share-lib, and that will include all Hive jars into the distributed cache classpath.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Checkpoint Status on name node

Re: namenode can't start after die

Re: namenode can't start after die

Re: JOB Stuck in Accepted State

Re: How to read the parameter value in mapper code...

Re: What does state transition RUNNING --> ACCEPTE...

Re: What does state transition RUNNING --> ACCEPTE...

Re: Impact when rename hbase big table with snapsh...

Re: Exception in thread "main" java.lang.OutOfMemo...

Re: Add CLASSPATH to Oozie workflow job