About Harsh J

Harsh J · ‎03-23-2018

What CDH version are you using? If it is equal to or lower than 5.9.1 or 5.8.3, and you use a KMS service in the cluster (for HDFS Transparent Encryption Zone features), you may be hitting https://issues.apache.org/jira/browse/HADOOP-13838, which has been fixed in the bug-fix releases of CDH 5.8.4, 5.9.2, and 5.10.0 onwards.

Harsh J · ‎03-22-2018

Thank you, Please try an 'unset HADOOP_HDFS_HOME' and retry your command(s), without including the hadoop-hdfs jars this time. Does it succeed? Can you figure out who/what is setting HADOOP_HDFS_HOME env-var in your user session? This must not be set, as it is self-set to the correct path by CDH scripts without manual intervention. You can check .bashrc/.bash_profile to start with, perhaps.

Harsh J · ‎03-22-2018

> user=anonymous It appears that your query connects without a proper username supplied. If this is a HiveServer2 based connection, ensure you provide a username in the connection string. This would be the simplest way to resolve your issue, typically done by adding "user=username" as a property in the connection string. If you absolutely do wish to use anonymous, then ensure that you use a non /tmp path with 777 (but no sticky bit) access to the parent directory, so that the user 'anonymous' is allowed to remove files from that directory. This breaks down security, so is not advisable as a long term solution.

Harsh J · ‎03-20-2018

Thank you for the added info. I notice now that your 'hadoop classpath' oddly does not mention any hadoop-hdfs library paths. Can you post an output of 'env' and the contents of your /etc/hadoop/conf/hadoop-env.sh file from the same host where the hadoop classpath output was generated? CDH scripts auto-add /opt/cloudera/parcels/CDH/lib/hadoop-hdfs/ paths, unless some environment variables such as HADOOP_HDFS_HOME have been overriden to point to an invalid path. The requested output above is to help check that among other factors that influence the classpath building script.

Harsh J · ‎03-20-2018

Agreed. You shouldn't need more than 3-4 GiB of heap, going by an x3 or x4 factor of ideal block count for that storage (storage divided by block size).

Harsh J · ‎03-19-2018

> ERROR whitelist must be specified when using new consumer in mirror maker. In the Kafka Service add wizard, ensure you enter at least one entry under the presented Mirror Maker pre-configuration page for the field "Topic Whitelist". You can do this after adding the service (like in your current case) by visiting CM -> Kafka -> Configuration, look for "Topic Whitelist" and add some valid values to it.

Harsh J · ‎03-17-2018

There are no limits in the source code implementation, if that is what you are asking. There are practical limits such as replication bandwidth (applied at loss) and reporting load (for low-latency operations) that you will run into when exceeding storage boundaries. See also our Hardware Requirements guide: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb

Harsh J · ‎03-17-2018

A bit of info: - total_read_requests_rate_across_regionservers tracks the RS JMX bean of Server::readRequestCount - total_write_requests_rate_across_regionservers tracks the RS JMX bean of Server::writeRequestCount - total_requests_rate_across_regionservers tracks the RS JMX bean of Server::totalRequestCount The first two apply only to RS operations that operate on data, but the third applies also to other meta-operations such as openRegion, closeRegion, etc. that the RegionServer services (for Master and other commanding clients). > Which metric reflects the actual load of the HBase cluster? Data-wise its the read/write requests you want to look at. > Given the names I was expecting something like: total_requests = total_read_requests + total_write_requests but this is clearly not the case. The readRequestCount tracks only read operations (get/scan), where it also counts up multiple rows counted during scans. The totalRequestCount only counts by one per RPC done to RS, not per-row of reads done. This would cause a difference between the three metrics. Hope this helps explain what these three metrics truly are. TL;DR: total_read_requests_rate_across_regionservers -> Read operations count rate, counted per row scanned total_write_requests_rate_across_regionservers -> Write operations count rate, counted per row written total_requests_rate_across_regionservers -> Overall RS RPC-level call count rate, counted per request made to RS, not row-level

Harsh J · ‎03-17-2018

Have you tried looking at the failed job logs, for the printed job task ID of task_1500463014055_0245_m_000000 which failed?

Harsh J · ‎03-16-2018

The command is only for non-Cloudera-Manager deployments like the documentation notes: """ In non-managed deployments, you can start a Lily HBase Indexer Daemon manually on the local host with the following command: sudo service hbase-solr-indexer restart """ If you use Cloudera Manager then just add a new Service from the Clusters page of the type "Key-Value Store Indexer" shown in the new service list. Then proceed with configuring it from CM and starting it.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Yarn Resource Manager Halts with java.lang.Ou...

Re: No FileSystem for scheme: hdfs

Re: Failed to execute hive queries [ Access denied...

Re: No FileSystem for scheme: hdfs

Re: Maximum capacity per DataNode

Re: Unable to start Kafka MirrorMaker

Re: Maximum capacity per DataNode

Re: Confusing metrics for HBase in Cloudera Manage...

Re: ERROR tool.ExportTool: Error during export: E...

Re: How to start a Lily HBase NRT Indexer Service...