About Harsh J

Harsh J · ‎09-21-2015

To add onto Wilfred's response, what is your CDH version? HDFS does cache all positive entries for 5 minutes, but negative caching wasn't supported until CDH 5.2.0 onwards (via HADOOP-10755). See also http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/core-default.xml#hadoop.security.groups.negative-cache.secs (which lists negative caching's TTL default being 30s, vs. positive's 300s). NSCD does also do negative caching by default, which could explain why the problem is gone, depending on how many negative, WARN group-lookup failure entries you observe in the log.

Priyap · ‎09-20-2015

Hi Harish, thanks for your reply. I have another doubt to ask you, how can we determine the no of mappers in the above mentioned wordcount programme. Can we determine that only using those 2 input files a.txt & b.txt ??. Is it mandatory that we should know the file size & block size? Please help...

NitinA · ‎09-18-2015

Hrsh, I was able to find the property and I modified queueMaxAppsDefault property and now I get more than 8 apps running concurently. Thanks for your help Nitin

Harsh J · ‎09-18-2015

The value on the doc page is picked as about 20% of the RAM for overhead reservation, but you could set it lower. Our past overcommit testing does show that the values can reach close to extra 20% in use for some tested workloads, but that would not be an always-as-such case - and this may have changed overall lately also. We're reworking the docs for these recommendations soon in future, as developments happen. For now, please rely on the XLSX file for a more closer guideline on the recommended calculated values.

Harsh J · ‎09-18-2015

Glad to hear you were able to figure it out. In spirit of https://xkcd.com/979/, please mark the thread solved with the solution post selected, so others with a similar problem can find their solution quicker on the web.

Harsh J · ‎09-09-2015

Start here, and drill further down into the DFSClient and DFSInputStream, etc. classes: https://github.com/cloudera/hadoop-common/blob/cdh5.4.5-release/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L294-L303

Gopi29 · ‎09-04-2015

Hi Harsh, Thanks a lot for your quick reply

Harsh J · ‎09-03-2015

In spirit of https://xkcd.com/979/, feel free to mark the thread as resolved if it does help your cause, so others may find a solution quicker.

Harsh J · ‎09-03-2015

Currently, the CM BDR feature does not carry any HBase replication abilities (we do support schedulable snapshot policies, but no replication/copies yet). You will need to utilise standard HBase techniques to copy over the data between your two clusters: http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/, and I'd recommend the ExportSnapshot method (if not live replication).

Harsh J · ‎09-03-2015

You will need the gateway copy, which exists under /etc/hive/conf/ on a Hive Gateway designated node (check Hive -> Instances in CM to find which hosts have a gateway role).

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Yarn MR overloads Active Directory domain cont...

Re: How to get the count of last key value pair in...

Re: Configuring number of concurrently running app...

Re: YARN Tuning - How is memory overhead estimate ...

Re: Sqoop fails with "Error parsing arguments for ...

Re: Hadoop read IO size

Re: Data node down

Re: zookeeper error Unexpected exception causing s...

Re: How to copy rows in an HBase table in Cloudera...

Re: Issue with: Hive Action - Oozie