About Harsh J

Harsh J · ‎08-26-2017

You may only use the -Dname=value form if your main class implements the Tool interface and gets invoked via the ToolRunner utility. Check the Tool javadoc example and model your implementation around it: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/util/Tool.html

Harsh J · ‎07-30-2017

Could you share your drive formatting options? The overall inode capacity seems to be very low - did you format with some special options for "fewer, larger files" perhaps?

Harsh J · ‎07-26-2017

Cloudera supports Apache Spark, upon which an Apache Beam runner exists. I assume this is what you'd meant to ask about? Apache Beam by itself is not a service that needs installation and management (such as via Cloudera Manager), but is rather a programming model that supports various execution modes (one of which is Apache Spark). You should be able to follow the tutorial on https://beam.apache.org/get-started/quickstart-java/ and https://beam.apache.org/documentation/runners/spark/ without trouble, just ensure to use the CDH version of Apache Spark when configuring your Java application's pom.xml. There's no direct support offered for Apache Beam SDKs by Cloudera at present, but I see no reason for it to not work on top of your CDH cluster.

Harsh J · ‎07-22-2017

HBase is required to perform log split if an RS goes down uncleanly. On why your RSs went down uncleanly, you'd need to check for FATAL messages in your independent RS logs, as the reason is not in the Master log snippet posted above. The dead server appears to have been hostnamedn02.com. On why the log splitting fails, since Master does a distributed log split, the reason of failure would also exist on the alive RS logs that tried to assist with the log splitting. In the snippet posted above, this host was hostnamedn01.com and hostnamedn05.com.

Harsh J · ‎07-22-2017

This is the same question as http://community.cloudera.com/t5/Storage-Random-Access-HDFS/How-to-connect-to-remote-Hbase-using-JAV..., where a reply is available.

Harsh J · ‎07-22-2017

This is the same question as http://community.cloudera.com/t5/Storage-Random-Access-HDFS/How-to-connect-to-remote-Hbase-using-JAVA-API/m-p/57731#M3059, where a reply is available.

Harsh J · ‎07-22-2017

HBase API calls would involve connecting to every HBase service role host on the cluster from the host you are executing on. This requires proper resolution available to discover all RegionServer and Master hostnames. In your case, your client host is able to resolve the passed ZK hostname of "en01com", but it must also be able to resolve every Master/RS host such as dn03.com. If you do not rely on a DNS backend to do this for you, your /etc/hosts file must carry every cluster host's entry in the below form: IP FQDN OptionalShortName

Harsh J · ‎07-16-2017

While your idea is correct in trying a different tmp path that allows execution and loading of libraries (your current /tmp may be mounted with 'noexec' applied, see output of 'mount' command), try specifying the alternative tmp path like this: ~> export HBASE_OPTS='-Djava.io.tmpdir=/ngs12/tmp' ~> hbase shell

Harsh J · ‎07-10-2017

> We have the superuser group defined as 'supergroup' in our configuration. However, this goup does not exist in any of the nodes. This is intentional. The default is set to a name (supergroup) that typically shouldn't exist by default, to protect against unintentional super-users right after install. You are free to modify the supergroup name via the HDFS -> Configuration -> "Superuser Group" field. > If I have to set up this group and start adding a couple of other accounts to have super usr access to hdfs, where should this Linux group be created? Should it be created in all nodes in the cluster? Or is it sufficient to create the Linux group in the Namenode hosts only? The general and bulletproof approach to adding Linux local groups and usernames in cluster is always "all hosts" when you use no centralized user/group management software (such as an AD via LDAP, etc.). The reason is that your host assignments are not static in the life of the cluster, so while doing the group additions on the NameNode(s) will work immediately, you will face weird authorization issues in future when a NameNode host needs to be migrated or replaced with another. Likewise when security may be turned on in future, it'd require local accounts on worker hosts.

Harsh J · ‎07-09-2017

This occurs due to the actions inheriting YARN NM configs which are not pre-configured for MR2. Since MR2 is an app-side concept in YARN and not an inbuilt/server-side one, your action environment does not find the adequate configs by referencing the NM ones. This was improved via https://issues.apache.org/jira/browse/OOZIE-2343 in CDH 5.5.0+, which ships configs along with the shell scripts that include MR2 specifics. For your older CDH version however, you can try the below: Step 1: Ensure all your hosts have a YARN/MR2 Gateway role added on it, and that client configuration is deployed on all hosts at /etc/hadoop/conf/*. Step 2: Add the env-var 'HADOOP_CONF_DIR=/etc/hadoop/conf' to all shell actions via the shell action configuration for passing environments, or via manual edits to the top of the shell scripts.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: How to change the log level of a MR job

Re: High inodes on HDFS nodes

Re: Does Cloudera support Apache Beam ?

Re: Hbase : All region servers down and hbase conn...

Re: org.apache.hadoop.hbase.client.RetriesExhauste...

Re: Hbase connection error from external system wh...

Re: How to connect to remote Hbase using JAVA API

Re: Hbase: Could not find or load main class shell

Re: Where does the super user group need to be cre...

Re: Sqoop/Hive in Oozie shell action running in lo...