About Harsh J

Harsh J · ‎09-09-2015

The reader buffer size is indeed controlled by that property (io.file.buffer.size) but note that if you're doing short circuited reads then another property that also applies is (dfs.client.read.shortcircuit.buffer.size, 1 MB in bytes by default).

Harsh J · ‎09-09-2015

You have installed the wrong Java JDK8 package. Please ensure to download the 64-bit JDK8 and remove your current 32-bit JDK8. A 64-bit JDK8 will print the below, for example, if you'd like to check and compare with your $JAVA_HOME/bin/java executable: ~ java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

Harsh J · ‎09-08-2015

Jobs typically read records - not entire blocks. Is your MR job doing anything different in this regard? Note that HDFS Readers do not read whole blocks of data at a time, and instead stream the data via a buffered read (64k-128k typically). That the block size is X MB does not translate into a memory requirement unless you are explicitly storing the entire block in memory when streaming the read.

Harsh J · ‎09-04-2015

Set the "DataNode Failed Volumes Tolerated" field in CM -> HDFS -> Configuration to the number of volumes each DN should tolerate the failure up to. XML property, if you do not use CM, is "dfs.datanode.failed.volumes.tolerated".

Harsh J · ‎09-03-2015

In spirit of https://xkcd.com/979/, feel free to mark the thread as resolved if it does help your cause, so others may find a solution quicker.

Harsh J · ‎09-03-2015

Currently, the CM BDR feature does not carry any HBase replication abilities (we do support schedulable snapshot policies, but no replication/copies yet). You will need to utilise standard HBase techniques to copy over the data between your two clusters: http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/, and I'd recommend the ExportSnapshot method (if not live replication).

Harsh J · ‎09-03-2015

Indeed, as szehon mentions, the use of /root may be your problem, especially if you've invoked the Hive CLI via a sudo command. The /root is protected against access from anyone but the root user.

Harsh J · ‎09-03-2015

You will need the gateway copy, which exists under /etc/hive/conf/ on a Hive Gateway designated node (check Hive -> Instances in CM to find which hosts have a gateway role).

Harsh J · ‎09-01-2015

You can add a Java System Property setting that key, into the "Java Configuration Options for Zookeeper Server" field under ZooKeeper -> Configuration page. Add in the -D format, for ex. for 4 GiB, append: -Djute.maxbuffer=4294967296

Harsh J · ‎08-25-2015

You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Hadoop read IO size

Re: Failed to load native-hadoop with error:libhad...

Re: Hadoop read IO size

Re: Data node down

Re: zookeeper error Unexpected exception causing s...

Re: How to copy rows in an HBase table in Cloudera...

Re: Load data local inpath says "invalid path" whe...

Re: Issue with: Hive Action - Oozie

Re: zookeeper error Unexpected exception causing s...

Re: Hbase utilizing more storage space while loadi...