Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1980 | 07-09-2019 12:53 AM | |
| 11935 | 06-23-2019 08:37 PM | |
| 9194 | 06-18-2019 11:28 PM | |
| 10187 | 05-23-2019 08:46 PM | |
| 4604 | 05-20-2019 01:14 AM |
05-31-2016
11:04 PM
Yes you should be able to reassign the region with the bad file moved out of its /hbase location (to somewhere like /tmp). It may also be worth investigating how you ended up losing that block altogether, especially if the file's timestamp is newer (your NN logs would help, search the block ID in it).
... View more
05-30-2016
07:11 PM
1 Kudo
Sqoop 1.x has no prompt, as its a simple utility-like command application that has no service associated to it - very much like how Pig is. Its usage is described in depth at its guide: http://archive.cloudera.com/cdh5/cdh/5/sqoop/SqoopUserGuide.html Sqoop 2.x is a different architecture that's under heavy development and is not recommended for wide use yet. Its client has a prompt style connection to the Sqoop Service that's part of its new architecture. Its usage is separately documented at http://archive.cloudera.com/cdh5/cdh/5/sqoop2/CommandLineClient.html Unless you are aware of Sqoop2's features and are looking for it specifically, you will likely need Sqoop 1.x (Simple 'sqoop' command) for your Sqoop needs. You can run a test command following the former user guide link posted above to ensure Sqoop 1.x command works (and it should, cause it has no dependencies or services to rely on).
... View more
05-12-2016
07:51 AM
Does the very same stack trace appear every time the OOME crash occurs? If yes, it may be that one of your script is sending a bogus create table request with very large table names or column names. An crash dump (if enabled) of your HMS, if small enough to analyze, can help reveal what some of the identified parameters of the request would be. If the point of OOME varies, then its likely that you're eventually running out of heap space, and you'd want to check the JVM memory graphs to see what the active heap utilised pattern looks like over time since last restart.
... View more
05-05-2016
02:51 AM
1 Kudo
I'm afraid there's no easy way to recover out of this if you've not taken HDFS snapshots prior to this either. If you've stopped the entire cluster immediately to prevent further disk usage, you can perhaps try to run ext-level disk recovery tools to recover the deleted blocks, and then rollback your NN to start from the pre-deletes checkpoint, and that may give back some fraction of your data.
... View more
05-02-2016
06:33 AM
You should checkout the Standby NameNode's log for checkpoint related messages to ascertain the issue - that's the daemon responsible for triggering a checkpoint, performing it and uploading it back into the Active NameNode, much like Secondary NameNode in a cluster without HA. Please open up a new topic as it would be unrelated to this one.
... View more
04-26-2016
05:54 PM
1 Kudo
Manu - There are two distinct questions on this post; One is "Does Hue allow you to run a WF as another user (than the one you are logged in as)?" to which the answer is no (you will need to login as the user you want to run as). The other question is why the $USER in Shell Actions on insecure or non-DRF enabled clusters appear always as "yarn" despite your user running the job, to which the answer is to enable the LCE in non-secure mode (or enable security in general), because the default container executor (DCE) runs all containers as the "yarn" user. If your question is distinct from the above two, I'd suggest raising a new topic.
... View more
04-24-2016
12:08 PM
Of course, another easier way to use ImportTSV itself, is to re-transform your CSV input via a custom mapper (passed via configuration key "importtsv.mapper.class"), and "merge" the two rows together before the CSV parser maps them into the designated fields. This is the default Map class for ImportTSV, for reference: https://github.com/cloudera/hbase/blob/cdh5.7.0-release/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TsvImporterMapper.java
... View more
04-24-2016
12:06 PM
The ImportTSV is a simple utility and does not currently support this. Perhaps you can take a look at Kite SDK's HBase and CSV dataset handling capabilities, which are capable of these tasks (although it uses the more efficient Avro encoding instead of plaintext during serialisation). Read more at http://kitesdk.org/docs/1.1.0/
... View more
04-24-2016
09:57 AM
1 Kudo
The HDFS client reads your input and sends packets of data (64k-128k chunks at a time) which are sent along with their checksums over the network, and the DNs involved in the write verify these continually as they receive them, before writing them to disk. This way you wouldn't suffer from network corruptions, and what's written onto the HDFS would match precisely what the client intended to send.
... View more
04-24-2016
09:44 AM
1 Kudo
The -put/-copyFromLocal programs follow a rename-upon-complete approach. When the file is uploading, it will be named as "filename._COPYING_" and upon closure it will be renamed to "filename". This should help you verify which files were not entirely copied. This feature is active by default but if undesirable, can be switched off with the -d flag. X-Ref: https://github.com/cloudera/hadoop-common/blob/cdh5.7.0-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandWithDestination.java#L380-L402
... View more