About Harsh J

Neelesh · ‎10-24-2016

Harsh, Could you clarify answer for point no.1 1. Whether hbase snapshot make another copy of the table or not.

weichiu · ‎10-16-2016

Hi, I don't think that's possible given that most applications are based on HDFS semantics (strong consistency, POSIX compatible), and S3 simply isn't designed as a file system (eventual consistency, blob store). Plus, you lose data locality. As far as I know, most cloud use cases still use HDFS as temporary, intermediate storage, and use S3 as permanent, eventual storage. There've been several studies in using HDFS as meta store, and cloud as data store, but that's a huge work (see HDFS-9806) and probably in the Hadoop 4/CDH 7 timeframe. Hope this helps.

attilabukor · ‎10-07-2016

Let's say your dataDir and old dataLogDir is /var/lib/zookeeper and now you're moving dataLogDir to /var/lib/zookeeper-log. First you change this in the service-wide configuration, which will make the stale configuration icon appear. Then you stop zk1, ssh into zk1 and run the following commands: $ mkdir -p /var/lib/zookeeper-log/version-2 $ cp /var/lib/zookeeper/version-2/log.* /var/lib/zookeeper-log/version-2/ $ chown -R zookeeper:zookeeper /var/lib/zookeeper-log Then you can start zk1 and wait until it's running and shows as either leader or follower in the Cloudera Manager service page too. After that's done, you can do the same with zk2 and finally with zk3 too. By this point the stale configuration alert should disappear and everything should be fine cluster-wide. As you said, the log.* files need to be copied only.

Harsh J · ‎10-05-2016

While it may appear possible to do this I'd strongly recommend against it because when you'd read back a written 150 MB MOB cell, it'd give you heap utilisation problems during the RPC encoding and transfer done by the RS. Its probably better to store the larger-than-10 MB files as HDFS files and store their paths in HBase.

Harsh J · ‎10-05-2016

You can find what operations are supported in the hdfs-fuse source: https://github.com/cloudera/hadoop-common/tree/cdh5.8.0-release/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs (this is a directory-tree, look for the files with "impls" in their names for the syscalls supported). Git likely needs more advanced features from its used filesystem that HDFS currently does not offer. You can run git under strace to find out what that syscall is.

Harsh J · ‎10-05-2016

Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3. This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.

Harsh J · ‎10-05-2016

For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release. For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.

Harsh J · ‎09-20-2016

Yes, you need to switch Oozie to submit over YARN and not MRv1. The switching guide covers this aspect.

manoj10 · ‎09-20-2016

can you tell me how to connect through java api to hbase cluster ( 4 node cluster running in VM's). from windows to Distribution running in server vm/s

Cibi · ‎09-12-2016

Hi harsh, The issue was with JCE files, i updated them in the wrong location instead of /usr/java/jdk1.7.0_25/jre/lib/security/. Once i have updated the JCE files in the above location, now i'm able to access HDFS. Thanks for the Help! Regards, Cibi

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: HBase Snapshot

Re: AWS S3 bucket as a primary storage for HDFS

Re: Move zookeeper datadir and datalogdir to new d...

Re: HBase cell size (files)

Re: What POSIX operations are not supported in HDF...

Re: What are the best practices for replicating HD...

Re: Concern about Replication when Scheduled NameN...

Re: Oozie Spark Action on Yarn - HADOOP_CONF_DIR o...

Re: [HBase] - Can't get master address from ZooKee...

Re: Unable to access HDFS after enabling kerberos