Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1180 | 01-16-2018 03:38 PM | |
6148 | 11-13-2017 05:45 PM | |
3054 | 11-13-2017 12:30 AM | |
1521 | 10-27-2017 03:58 AM | |
28444 | 10-19-2017 03:17 AM |
10-12-2021
09:37 PM
Here is the guide to enable the SPNEGO for Browser Based interfaces. https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/security-how-to-guides/topics/cm-security-enable-web-auth-s19.html You could try following this guide and find the option to " Enable Kerberos Authentication for HTTP Web-Consoles" in Cloudera Manager for the concerned service and then disable the ones you want. Hope this helps.
... View more
10-12-2021
09:33 PM
The error you are facing is an indication of credentialbuilder* jar is missing in ranger admin/usersync library. The easiest way to fix this is to remove the ranger usersync/admin packages and reinstall those packages/parcels.
... View more
10-12-2021
08:47 PM
3 Kudos
Introduction
This article is the final part in the series Scaling the Namenode (See part 1, part 2, part 3 and part 4)
In part 4 we discussed about monitoring Namenode Logs for Excessive Skews.
In this part, we will look at a few optimizations around logging, access check, and block reports.
Audience
This article is for Hadoop administrators who are familiar with HDFS and its components.
Audit log specific operations only when debug is enabled.
By default, the following property is set to blank so none of the Namenode operations are restricted from making an entry into Audit log.
Operations like getfileinfo results in fetching the metadata associated with a file, and in a large/read-heavy cluster, it can generate too much audit log. So, it is recommended to audit log getfileinfo only when audit log debug is enabled.
Change in hdfs-site.xml
<property>
<name>dfs.namenode.audit.log.debug.cmdlist<\name>
<value>getfileinfo<\value>
<description>A comma separated list of NameNode commands that are written to the HDFS
namenode audit log only if the audit log level is debug.
<\description>
<\property>
In Cloudera Manager you can add the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml".
Further, the BlockStateChange and the StateChange related logging are really only useful when those operations have failed i.e. the log entry for those classes is ERROR. At the default INFO level, these two classes generate a large amount of log entry in the Namenode logs. You can reduce the frequency of logging by adding the following lines in log4j.properties file in your Hadoop configurations. Under Cloudera Manager these properties can be added under "NameNode Logging Advanced Configuration Snippet (Safety Valve)".
log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR
Avoid recursive call to external authorizer for getContentSummary
getContentSummary is an expensive operation in general. It becomes even more expensive in a secured environment where the security is managed by an external component like Apache Ranger as the permission check is performed via a recursive call to check for all descendants in a path. HDFS-14112 introduced an improvement to make just one call with subaccess, because often they don't have to evaluate for each and every component of the path.
Change in hdfs-site.xml
<property>
<name>dfs.permissions.ContentSummary.subAccess</name>
<value>false</value>
<description>
If "true", the ContentSummary permission checking will use subAccess.
If "false", the ContentSummary permission checking will NOT use subAccess.
subAccess means using recursion to check the access of all descendants.
</description>
</property>
Again in Cloudera Manager place the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"
It is recommended to set this property to true so as to use subAccess.
Note: This improvement is only available in CDP releases, the older CDH/HDP releases do not have this improvement so adding this configuration on CDH/HDP releases is not recommended.
Optimizing Block Reports
In busy and large clusters (say 200 Datanodes), it is very important to not overwhelm NameNode with too frequent full block reports from the datanodes. If the NameNodes are already degraded, the block reports add further stress on the NameNodes. The NameNodes might be so slow to process the block reports that you would eventually see messages like 'Block report queue is full' in the NameNode logs.
It is interesting to note that while the default block report queue size is set to 1024, we can see this 'Block report queue is full' message even during a NameNode startup in what we call a block report flood event and also when your NameNode's RPC processing time is too high to indicate a severely degraded NameNode, thereby having a backlog of reports to process and eventually overflowing the queue.
While the block report queue size is configurable and you could essentially increase the queue size, a better approach is to optimize the way the data nodes send blocks reports.
We recommend a 3 prong approach to change the following in hdfs-site.xml:
Split block report by volume (Default value 1000000) <property>
<name>dfs.blockreport.split.threshold</name>
<value>0</value>
<description>
If the number of blocks on the DataNode is below this threshold then it will send block reports for all Storage Directories in a single message. If the number of blocks exceeds this threshold then the DataNode will send block reports for each Storage Directory in separate messages. Set to zero to always split.
</description>
</property>
Reduce full block report frequency from a default 6 hours to 12 hours <property>
<name>dfs.blockreport.intervalMsec</name>
<value>43200000</value>
<description>
Determines block reporting interval in milliseconds.
</description>
</property>
Batch incremental reports (Default value 0 disables batching) <property>
<name>dfs.blockreport.incremental.intervalMsec</name>
<value> 100 </value>
<description>
If set to a positive integer, the value in ms to wait between sending incremental block reports from the Datanode to the Namenode.
</description>
</property>
All 3 belong under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" in Cloudera Manager.
Conclusion
This wraps up the series on getting the best performance possible out of your NameNode. We hope these tips will keep your cluster running at its best and your users happy.
... View more
08-21-2019
08:47 PM
2 Kudos
When using Ambari Metrics in Distributed Mode,after HDP-3/Ambari-2.7 upgrade, HBase metrics are not emitted due to an issue which will likely be fixed after Ambari-2.7.4. We will see similar kind of messages in AMS Collector Log. Error : 2019-06-10 02:42:59,215 INFO timeline timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20
Debug Error shows this :
2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Trying to find live collector host from : exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Requesting live collector nodes : http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: java.net.UnknownHostException: exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: Collector exp5.lab.com,exp4.lab.com is not longer live. Removing it from list of know live collector hosts : [] 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: No live collectors from configuration. Basically, it's incorrectly parsing hostnames when there are more than one Metrics collector. In the meantime, there is a very easy workaround. Add *.sink.timeline.zookeeper.quorum=<ZK_QUORUM_ADDRESS> Example: *.sink.timeline.zookeeper.quorum=zk_host1:2181,zk_host2:2181,zk_host3:2181 to the following files on Ambari Server host: /var/lib/ambari-server/resources/stacks/HDP/3.0/services/HBASE/package/templates/hadoop-metrics2-hbase.properties-GANGLIA-MASTER.j2 /var/lib/ambari-server/resources/stacks/HDP/3.0/services/HBASE/package/templates/hadoop-metrics2-hbase.properties-GANGLIA-RS.j2 Restart Ambari Server for changes to take effect and soon you will be able to see metrics on the Grafana HBase Dashboards.
... View more
08-21-2019
07:34 PM
1 Kudo
In HDP-2.6/Ambari-2.6, it was not mandatory enable HS2 metrics explicitly. Thus, all metrics would be emitted without defining any configs explicitly. In HDP-3/Ambari-2.7, we will see similar erros in AMS Collector Log: Error : 2019-06-10 02:42:59,215 INFO timeline timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20
Debug Error shows this :
2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Trying to find live collector host from : exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Requesting live collector nodes : http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: java.net.UnknownHostException: exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: Collector exp5.lab.com,exp4.lab.com is not longer live. Removing it from list of know live collector hosts : [] 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: No live collectors from configuration. You need to ensure the following properties exist. If not, first add them in the respective custom section via Ambari >Hive> Configs. Next, if you are using Ambari Metrics with more than one collector, then you need to make one more change due a BUG, which will likely be fixed after Ambari-2.7.4. Add *.sink.timeline.zookeeper.quorum=<ZK_QUORUM_ADDRESS> Example: *.sink.timeline.zookeeper.quorum=zk_host1:2181,zk_host2:2181,zk_host3:2181 to all the 4 files under /var/lib/ambari-server/resources/stacks/HDP/3.0/services/HIVE/package/templates/ located on Ambari Server host. Restart Ambari Server & Hive for changes to take effect. Now the metrics will be emitted and you should be able to see data on your Grafana Dashboard.
... View more
05-23-2019
01:26 PM
@Maurice Knopp We do not yet have any planned dates yet. However, if you are an Enterprise Support customer, you can ask for a hotfix and you will be provided a patch jar which is very easy to replace on all machines with Tez.
... View more
05-23-2019
03:41 AM
@Maurice Knopp We recently saw that TEZ-3894 only fixes the issue partially. If you job ends up spinning multiple mappers then you are likely to hit a variant of TEZ-3894 although on surface it appears to be same. For permanent fix, you may want to get a patch for https://issues.apache.org/jira/browse/TEZ-4057
... View more
02-15-2019
08:52 PM
@Mahesh Balakrishnan Since there can be only one accepted answer 😞 , I am sharing 25 bounty points with you. Thanks for the guidance.
... View more
02-15-2019
08:51 PM
@Kuldeep Kulkarni Thanks that got me past the initial problem and then I ran into following error: resource_management.core.exceptions.Fail: Cannot find /usr/hdp/current/oozie-client/doc. Possible reason is that /etc/yum.conf contains tsflags=nodocs which prevents this folder from being installed along with oozie-client package. If this is the case, please fix /etc/yum.conf and re-install the package. This was resolved by @Mahesh Balakrishnan's input of remove and install the packages.
... View more
02-15-2019
07:27 PM
@Kuldeep Kulkarni, @amarnath reddy pappu Any pointers? TIA.
... View more