About Harsh J

Harsh J · ‎03-16-2018

It appears that your HMaster is crashing out during startup. Take a look at the HMaster log file under /var/log/hbase/ to investigate why. If you are able to run the configured ZK properly, check if the /hbase znode appears on it.

Harsh J · ‎03-16-2018

The two exams (CCA and CCP) are independent of each other. As to dates and time, you can select a from a shown range to fit what works for you.

Harsh J · ‎03-15-2018

A few checks: - Does the host where you invoke spark-submit carry a valid Spark Gateway role, with deployed configs under /etc/spark/conf/? There's also a classpath file under that location, which you may want to check to see if it includes all HDFS and YARN jars. - Do you bundle any HDFS/YARN project jars in your Spark App jar (such as a fat-jar assembly)? You may want to check the version matches with what is on the cluster classpath. - Are there any global environment variables (run 'env' to check) that end in or carry 'CLASSPATH' in their name? Try unsetting these and retrying.

Harsh J · ‎03-01-2018

You've mentioned the RAM of the machine your DataNode is assigned to run on, however what is your configured DataNode Java JVM heap size? You could try raising it by 1 GB from its current value, to resolve this. Also, what's the entire Out of Memory message, 'cause "unable to create a new native thread" (or summat) is entirely different than "Java heap space" in what it implies (nproc limit issue vs. actual heap memory exhaustion).

Harsh J · ‎02-12-2018

What CDH version, and could you attach/pastebin the full stack trace dump that the log produces? I'd also lookout for a FATAL message in the logs. A self-abort in NameNode should always carry that.

Harsh J · ‎02-02-2018

Yes that is precisely correct - it balances by average utilization percentage per node rather than by average byte count.

Harsh J · ‎01-30-2018

To change the whole log directory you'll currently need to pass --logdir to the agent as arguments, instead of via a config flag. Edit your agent environment config file at /etc/default/cloudera-scm-agent and ensure that the CMF_AGENT_ARGS env-var inside it carries the below: --logdir=/your/custom/cloudera-scm/user/writable/directory/ Save, then restart the agent service. P.s. Using symlinks will also work.

Harsh J · ‎01-07-2018

You will need to specify your custom endpoint URL too, besides credentials, just like is done on the page 12 of the document you've referenced, but with property 'fs.s3a.endpoint' instead (for s3a). See http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/core-default.xml#fs.s3a.endpoint Without specifying your custom endpoint URL, the requests will go to Amazon's S3 servers instead (default).

Harsh J · ‎12-11-2017

The subdirs carry actual block data - deleting these would be fatal for your actual HDFS data. If you have a space problem, clear out files on HDFS by issuing regular deletes (fs -rm, etc.), not by messing around with the internal storage format on independent DataNodes. Be sure to also check if you have stale HDFS snapshots retaining older files. The reason DNs use a subdirectory structure is mostly to avoid hitting its underlying filesystem's (ext4, xfs, etc.) limits, and to make certain scanning operations (such as for block reports) more efficient.

Harsh J · ‎11-20-2017

The default behaviour of Hadoop is to run things locally, in face of no found YARN cluster configuration. In CM managed clusters, cluster configuration for client programs are deployed by means of a Gateway role. Your edge host is missing a gateway role and the subsequent config files required to discover and use the cluster daemons. Do these two steps: 1. Visit YARN -> Instances page in CM, then click 'Add Role Instances' and under the Gateway type in the modal dialog, find and add your edge hostname to it (this edge host should already be running a CM agent for it to show up here). 2. Deploy cluster-wide client configs, following this: https://www.youtube.com/watch?v=4S9H3wftM_0 Retry your commands after this completes. Also verify that your edge host now has a proper /etc/hadoop/conf symlink, with the directory contents carrying info about the cluster. P.s. Having HDFS Gateways is insufficient to connect to YARN, you will need a YARN Gateway to connect to YARN.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: no master to stop because no pid file /var/run...

Re: CCA and CCP exams

Re: No FileSystem for scheme: hdfs

Re: Datanode out of memory

Re: Hadoop NameNodes in HA crash every time after ...

Re: Rebalancing differently sized nodes

Re: Changes to /etc/cloudera-scm-agent/config.ini ...

Re: Integration with Hitachi Object Storage (HCP)

Re: how to clean dfs/dn/current/<blockpool>/curren...

Re: Job submitted on edge node runs in local host ...