About Harsh J

Harsh J · ‎05-31-2016

Yes you should be able to reassign the region with the bad file moved out of its /hbase location (to somewhere like /tmp). It may also be worth investigating how you ended up losing that block altogether, especially if the file's timestamp is newer (your NN logs would help, search the block ID in it).

Harsh J · ‎05-30-2016

Sqoop 1.x has no prompt, as its a simple utility-like command application that has no service associated to it - very much like how Pig is. Its usage is described in depth at its guide: http://archive.cloudera.com/cdh5/cdh/5/sqoop/SqoopUserGuide.html Sqoop 2.x is a different architecture that's under heavy development and is not recommended for wide use yet. Its client has a prompt style connection to the Sqoop Service that's part of its new architecture. Its usage is separately documented at http://archive.cloudera.com/cdh5/cdh/5/sqoop2/CommandLineClient.html Unless you are aware of Sqoop2's features and are looking for it specifically, you will likely need Sqoop 1.x (Simple 'sqoop' command) for your Sqoop needs. You can run a test command following the former user guide link posted above to ensure Sqoop 1.x command works (and it should, cause it has no dependencies or services to rely on).

Harsh J · ‎05-12-2016

Does the very same stack trace appear every time the OOME crash occurs? If yes, it may be that one of your script is sending a bogus create table request with very large table names or column names. An crash dump (if enabled) of your HMS, if small enough to analyze, can help reveal what some of the identified parameters of the request would be. If the point of OOME varies, then its likely that you're eventually running out of heap space, and you'd want to check the JVM memory graphs to see what the active heap utilised pattern looks like over time since last restart.

Harsh J · ‎05-05-2016

I'm afraid there's no easy way to recover out of this if you've not taken HDFS snapshots prior to this either. If you've stopped the entire cluster immediately to prevent further disk usage, you can perhaps try to run ext-level disk recovery tools to recover the deleted blocks, and then rollback your NN to start from the pre-deletes checkpoint, and that may give back some fraction of your data.

Harsh J · ‎05-02-2016

You should checkout the Standby NameNode's log for checkpoint related messages to ascertain the issue - that's the daemon responsible for triggering a checkpoint, performing it and uploading it back into the Active NameNode, much like Secondary NameNode in a cluster without HA. Please open up a new topic as it would be unrelated to this one.

Harsh J · ‎04-26-2016

Manu - There are two distinct questions on this post; One is "Does Hue allow you to run a WF as another user (than the one you are logged in as)?" to which the answer is no (you will need to login as the user you want to run as). The other question is why the $USER in Shell Actions on insecure or non-DRF enabled clusters appear always as "yarn" despite your user running the job, to which the answer is to enable the LCE in non-secure mode (or enable security in general), because the default container executor (DCE) runs all containers as the "yarn" user. If your question is distinct from the above two, I'd suggest raising a new topic.

Harsh J · ‎04-24-2016

Of course, another easier way to use ImportTSV itself, is to re-transform your CSV input via a custom mapper (passed via configuration key "importtsv.mapper.class"), and "merge" the two rows together before the CSV parser maps them into the designated fields. This is the default Map class for ImportTSV, for reference: https://github.com/cloudera/hbase/blob/cdh5.7.0-release/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TsvImporterMapper.java

Harsh J · ‎04-24-2016

The ImportTSV is a simple utility and does not currently support this. Perhaps you can take a look at Kite SDK's HBase and CSV dataset handling capabilities, which are capable of these tasks (although it uses the more efficient Avro encoding instead of plaintext during serialisation). Read more at http://kitesdk.org/docs/1.1.0/

Harsh J · ‎04-24-2016

The HDFS client reads your input and sends packets of data (64k-128k chunks at a time) which are sent along with their checksums over the network, and the DNs involved in the write verify these continually as they receive them, before writing them to disk. This way you wouldn't suffer from network corruptions, and what's written onto the HDFS would match precisely what the client intended to send.

Harsh J · ‎04-24-2016

The -put/-copyFromLocal programs follow a rename-upon-complete approach. When the file is uploading, it will be named as "filename._COPYING_" and upon closure it will be renamed to "filename". This should help you verify which files were not entirely copied. This feature is active by default but if undesirable, can be switched off with the -d flag. X-Ref: https://github.com/cloudera/hadoop-common/blob/cdh5.7.0-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java#L380-L402

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: HBase - Region in Transition

Re: Sqoop 1 Client not started

Re: OutOfMemoryError is very frequent on Hive Meta...

Re: how to recover missing blocks of hdfs after de...

Re: Rebooting steps for secondary namenode

Re: How to run Oozie workfllow or action as anothe...

Re: HBase: Composite key for ImportTsv

Re: HBase: Composite key for ImportTsv

Re: does hdfs dfs -put verifies that the transfer ...

Re: does hdfs dfs -put verifies that the transfer ...