About Harsh J

Harsh J · ‎11-02-2017

You can pass an input directory to the ImportTSV tool, where your directory can carry any number of files. It is not limited to a single file unless you pass a single file (instead of a directory) to it.

Harsh J · ‎11-02-2017

You are right that its all just byte sequences to HBase, and that it sorts everything lexicographically. You do not require a separator character when composing your key for HBase to understand them as boundaries (cause it would not serve as one), unless you prefer the extra bytes for better readability or for recovering back the individual data elements from (variable length) keys if that's a use-case. HBase 'sharding' (splitting) can be manually specified at table create time if you are aware of your key pattern and ranges - this is strongly recommended to scale from the beginning. Otherwise, HBase computes key midpoints by analysing them in byte form and splits them based on that, whenever a split size threshold is reached for a region range.

ianeeshps · ‎10-24-2017

There is a simple method to remove those. 1. List those directories inside a txt file like below hadoop fs -ls /path > test 2. cat -t test will give you positions of duplicate with junk character 3. open another shell and just try to comment it # to identify exact ones 4. again cat -t the file to confirm u commented the culprits 5. remove original folder frm list 6. for i in `cat list`; do hadoop fs -rmr $i; done

cdhhadoop · ‎10-23-2017

@Harsh J, Thanks for quick reply. I thought the ouptut of fsck command includes replica multiplier and gives final total block count. Thanks for the clarification. I checked Datanodes page on namenode WebUI and block count for each datanode is more than threshold value. Thanks, Priya

iraz098 · ‎10-20-2017

You can search it from the console using command $ locate *hive-hcatalog-core*.jar

dwill · ‎10-19-2017

Thanks so much for the help. That worked. I was able to get the backup of the fsimage.

Harsh J · ‎10-15-2017

Currently the MapReduceIndexerTool appears to hardcode the job names, so it does not appear configurable: https://github.com/cloudera/search/blob/cdh5.13.0-release/search-mr/src/main/java/org/apache/solr/hadoop/MapReduceIndexerTool.java#L812 (and other such setJobName calls in the driver).

bugsss · ‎10-14-2017

Hi , I am also getting same error message in the namenode logs. I have tried the below solution but in my case seen_txid file present only in the folder tmp/hadoop-root/dfs/name/current. Any other solution?

wanderer · ‎09-21-2017

Reading through this blog helped me a lot: http://blog.cloudera.com/blog/2014/11/guidelines-for-installing-cdh-packages-on-unsupported-operating-systems/ Specially understanding that this can be a package mix/match issue because of trying to run YARN on a unsupported version (Ubuntu 16.04). I removed package and reinstalled like this: $> sudo apt-get remove zookeeper this will remove a bunch of stuff along with zookeeper $> sudo apt-get install hadoop-yarn-resourcemanager this will reinstall resource manager for you. Hope this helps!

pdvorak · ‎09-19-2017

As I stated in my recent comment, the flume kafka client was upgraded as a part of the CDH5.8 upgrade to be able to use the new consumer API, which supports secure communication with kerberos. Versions prior to CDH5.8 use the old api which doesn't support kerberos or SSL. You will have to upgrade to get this new functionality, or run flume outside of Cloudera Manager, using tarballs or RPM's. -pd

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Importing multiple .CSV files to Hbase using i...

Re: HBase: Composite key for ImportTsv

Re: Duplicate Directories in HDFS

Re: block count warning still shows in cloudera ma...

Re: Adding Hive SerDe jar on SparkSQL Thrift Serve...

Re: Namenode backup help

Re: How to set job name with MapReduceIndexerTool

Re: namenode can't start after die

Re: Error starting Resource Manager - http server

Re: Kafka.properties override for listeners proper...