Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1969 | 07-09-2019 12:53 AM | |
| 11881 | 06-23-2019 08:37 PM | |
| 9146 | 06-18-2019 11:28 PM | |
| 10133 | 05-23-2019 08:46 PM | |
| 4580 | 05-20-2019 01:14 AM |
03-23-2018
10:48 PM
What CDH version are you using? If it is equal to or lower than 5.9.1 or 5.8.3, and you use a KMS service in the cluster (for HDFS Transparent Encryption Zone features), you may be hitting https://issues.apache.org/jira/browse/HADOOP-13838, which has been fixed in the bug-fix releases of CDH 5.8.4, 5.9.2, and 5.10.0 onwards.
... View more
03-22-2018
01:41 AM
3 Kudos
Thank you, Please try an 'unset HADOOP_HDFS_HOME' and retry your command(s), without including the hadoop-hdfs jars this time. Does it succeed? Can you figure out who/what is setting HADOOP_HDFS_HOME env-var in your user session? This must not be set, as it is self-set to the correct path by CDH scripts without manual intervention. You can check .bashrc/.bash_profile to start with, perhaps.
... View more
03-22-2018
12:28 AM
1 Kudo
> user=anonymous It appears that your query connects without a proper username supplied. If this is a HiveServer2 based connection, ensure you provide a username in the connection string. This would be the simplest way to resolve your issue, typically done by adding "user=username" as a property in the connection string. If you absolutely do wish to use anonymous, then ensure that you use a non /tmp path with 777 (but no sticky bit) access to the parent directory, so that the user 'anonymous' is allowed to remove files from that directory. This breaks down security, so is not advisable as a long term solution.
... View more
03-20-2018
09:18 PM
Thank you for the added info. I notice now that your 'hadoop classpath' oddly does not mention any hadoop-hdfs library paths. Can you post an output of 'env' and the contents of your /etc/hadoop/conf/hadoop-env.sh file from the same host where the hadoop classpath output was generated? CDH scripts auto-add /opt/cloudera/parcels/CDH/lib/hadoop-hdfs/ paths, unless some environment variables such as HADOOP_HDFS_HOME have been overriden to point to an invalid path. The requested output above is to help check that among other factors that influence the classpath building script.
... View more
03-20-2018
02:25 PM
Agreed. You shouldn't need more than 3-4 GiB of heap, going by an x3 or x4 factor of ideal block count for that storage (storage divided by block size).
... View more
03-19-2018
01:30 AM
1 Kudo
> ERROR whitelist must be specified when using new consumer in mirror maker. In the Kafka Service add wizard, ensure you enter at least one entry under the presented Mirror Maker pre-configuration page for the field "Topic Whitelist". You can do this after adding the service (like in your current case) by visiting CM -> Kafka -> Configuration, look for "Topic Whitelist" and add some valid values to it.
... View more
03-17-2018
03:55 AM
There are no limits in the source code implementation, if that is what you are asking. There are practical limits such as replication bandwidth (applied at loss) and reporting load (for low-latency operations) that you will run into when exceeding storage boundaries. See also our Hardware Requirements guide: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.html#concept_fzz_dq4_gbb
... View more
03-17-2018
03:40 AM
1 Kudo
A bit of info: - total_read_requests_rate_across_regionservers tracks the RS JMX bean of Server::readRequestCount - total_write_requests_rate_across_regionservers tracks the RS JMX bean of Server::writeRequestCount - total_requests_rate_across_regionservers tracks the RS JMX bean of Server::totalRequestCount The first two apply only to RS operations that operate on data, but the third applies also to other meta-operations such as openRegion, closeRegion, etc. that the RegionServer services (for Master and other commanding clients). > Which metric reflects the actual load of the HBase cluster? Data-wise its the read/write requests you want to look at. > Given the names I was expecting something like: total_requests = total_read_requests + total_write_requests but this is clearly not the case. The readRequestCount tracks only read operations (get/scan), where it also counts up multiple rows counted during scans. The totalRequestCount only counts by one per RPC done to RS, not per-row of reads done. This would cause a difference between the three metrics. Hope this helps explain what these three metrics truly are. TL;DR: total_read_requests_rate_across_regionservers -> Read operations count rate, counted per row scanned total_write_requests_rate_across_regionservers -> Write operations count rate, counted per row written total_requests_rate_across_regionservers -> Overall RS RPC-level call count rate, counted per request made to RS, not row-level
... View more
03-17-2018
03:04 AM
1 Kudo
Have you tried looking at the failed job logs, for the printed job task ID of task_1500463014055_0245_m_000000 which failed?
... View more
03-16-2018
10:52 PM
The command is only for non-Cloudera-Manager deployments like the documentation notes: """ In non-managed deployments, you can start a Lily HBase Indexer Daemon manually on the local host with the following command: sudo service hbase-solr-indexer restart """ If you use Cloudera Manager then just add a new Service from the Clusters page of the type "Key-Value Store Indexer" shown in the new service list. Then proceed with configuring it from CM and starting it.
... View more