About dineshc

VidyaSargur · ‎10-18-2021

@rajatsachan, to help others who may face similar issues, it will be great if you can mark the response that helped you resolve your issue as a solution. To Mark as the solution, you can click this button If you resolved the issue in any other way, please provide the solution in this thread and you can mark that as a solution.

edoS · ‎10-13-2021

For the information, file credentialbuilder*.jar is not missing. The problem is this variable, RANGER_OZONE_PLUGIN_INSTALL_LIB, is pointing to wrong directory, /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313:GPLEXTRAS-7.1.6-1.gplextras7.1.6.p0.10506313/lib/ranger-ozone-plugin/install/lib The correct directory is /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/lib/ranger-ozone-plugin/install/lib Before installing the parcel, this variable is pointing to the correct directory. I don't know why after GPL Extras parcel being installed, Cloudera Manager insert some string (:GPLEXTRAS-7.1.6-1.gplextras7.1.6.p0.10506313) in the variable. I think if I could edit it from Cloudera Manager, it should resolve the issues. Any advice how to edit this variable using Cloudera Manager?

dineshc · ‎10-12-2021

Introduction This article is the final part in the series Scaling the Namenode (See part 1, part 2, part 3 and part 4) In part 4 we discussed about monitoring Namenode Logs for Excessive Skews. In this part, we will look at a few optimizations around logging, access check, and block reports. Audience This article is for Hadoop administrators who are familiar with HDFS and its components. Audit log specific operations only when debug is enabled. By default, the following property is set to blank so none of the Namenode operations are restricted from making an entry into Audit log. Operations like getfileinfo results in fetching the metadata associated with a file, and in a large/read-heavy cluster, it can generate too much audit log. So, it is recommended to audit log getfileinfo only when audit log debug is enabled. Change in hdfs-site.xml <property> <name>dfs.namenode.audit.log.debug.cmdlist<\name> <value>getfileinfo<\value> <description>A comma separated list of NameNode commands that are written to the HDFS namenode audit log only if the audit log level is debug. <\description> <\property> In Cloudera Manager you can add the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml". Further, the BlockStateChange and the StateChange related logging are really only useful when those operations have failed i.e. the log entry for those classes is ERROR. At the default INFO level, these two classes generate a large amount of log entry in the Namenode logs. You can reduce the frequency of logging by adding the following lines in log4j.properties file in your Hadoop configurations. Under Cloudera Manager these properties can be added under "NameNode Logging Advanced Configuration Snippet (Safety Valve)". log4j.logger.BlockStateChange=ERROR log4j.logger.org.apache.hadoop.hdfs.StateChange=ERROR Avoid recursive call to external authorizer for getContentSummary getContentSummary is an expensive operation in general. It becomes even more expensive in a secured environment where the security is managed by an external component like Apache Ranger as the permission check is performed via a recursive call to check for all descendants in a path. HDFS-14112 introduced an improvement to make just one call with subaccess, because often they don't have to evaluate for each and every component of the path. Change in hdfs-site.xml <property> <name>dfs.permissions.ContentSummary.subAccess</name> <value>false</value> <description> If "true", the ContentSummary permission checking will use subAccess. If "false", the ContentSummary permission checking will NOT use subAccess. subAccess means using recursion to check the access of all descendants. </description> </property> Again in Cloudera Manager place the property under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" It is recommended to set this property to true so as to use subAccess. Note: This improvement is only available in CDP releases, the older CDH/HDP releases do not have this improvement so adding this configuration on CDH/HDP releases is not recommended. Optimizing Block Reports In busy and large clusters (say 200 Datanodes), it is very important to not overwhelm NameNode with too frequent full block reports from the datanodes. If the NameNodes are already degraded, the block reports add further stress on the NameNodes. The NameNodes might be so slow to process the block reports that you would eventually see messages like 'Block report queue is full' in the NameNode logs. It is interesting to note that while the default block report queue size is set to 1024, we can see this 'Block report queue is full' message even during a NameNode startup in what we call a block report flood event and also when your NameNode's RPC processing time is too high to indicate a severely degraded NameNode, thereby having a backlog of reports to process and eventually overflowing the queue. While the block report queue size is configurable and you could essentially increase the queue size, a better approach is to optimize the way the data nodes send blocks reports. We recommend a 3 prong approach to change the following in hdfs-site.xml: Split block report by volume (Default value 1000000) <property> <name>dfs.blockreport.split.threshold</name> <value>0</value> <description> If the number of blocks on the DataNode is below this threshold then it will send block reports for all Storage Directories in a single message. If the number of blocks exceeds this threshold then the DataNode will send block reports for each Storage Directory in separate messages. Set to zero to always split. </description> </property> Reduce full block report frequency from a default 6 hours to 12 hours <property> <name>dfs.blockreport.intervalMsec</name> <value>43200000</value> <description> Determines block reporting interval in milliseconds. </description> </property> Batch incremental reports (Default value 0 disables batching) <property> <name>dfs.blockreport.incremental.intervalMsec</name> <value> 100 </value> <description> If set to a positive integer, the value in ms to wait between sending incremental block reports from the Datanode to the Namenode. </description> </property> All 3 belong under "NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" in Cloudera Manager. Conclusion This wraps up the series on getting the best performance possible out of your NameNode. We hope these tips will keep your cluster running at its best and your users happy.

mzinal · ‎08-06-2021

Had the same issue on CDP 7.1.6, which comes with Tez 0.9.1. Looks like this: https://issues.apache.org/jira/browse/TEZ-4057 One workaround (probably not 100% secure) is to add the yarn user to the hive group: usermod -a -G hive yarn This needs to be done on all nodes and requires Yarn services restart. After that the issue has gone, no more random errors for Hive on Tez anymore.

HasanAmmori · ‎04-17-2020

They are actually not the same. SORT BY sorts data inside partition, while ORDER BY is global sort. SORT BY calls sortWithinPartitions() function, while ORDER BY calls sort() Both of these functions call sortInternal(), but with different global flag: def sortWithinPartitions ... sortInternal(global = false, sortExprs) def sort ... sortInternal(global = true, sortExprs)

aps · ‎03-27-2020

@dineshc Please specify where the mentioned workaround property needs to be added. Ams-site.xml or Ams-Hbase-site.xml?

Shahmen · ‎09-24-2019

Thank you very much!

dineshc · ‎08-21-2019

In HDP-2.6/Ambari-2.6, it was not mandatory enable HS2 metrics explicitly. Thus, all metrics would be emitted without defining any configs explicitly. In HDP-3/Ambari-2.7, we will see similar erros in AMS Collector Log: Error : 2019-06-10 02:42:59,215 INFO timeline timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 Debug Error shows this : 2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Trying to find live collector host from : exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,538 DEBUG main timeline.HadoopTimelineMetricsSink: Requesting live collector nodes : http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://exp5.lab.com,exp4.lab.com:6188/ws/v1/timeline/metrics/livenodes 2019-06-14 20:35:29,557 DEBUG main timeline.HadoopTimelineMetricsSink: java.net.UnknownHostException: exp5.lab.com,exp4.lab.com 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: Collector exp5.lab.com,exp4.lab.com is not longer live. Removing it from list of know live collector hosts : [] 2019-06-14 20:35:29,558 DEBUG main timeline.HadoopTimelineMetricsSink: No live collectors from configuration. You need to ensure the following properties exist. If not, first add them in the respective custom section via Ambari >Hive> Configs. Next, if you are using Ambari Metrics with more than one collector, then you need to make one more change due a BUG, which will likely be fixed after Ambari-2.7.4. Add *.sink.timeline.zookeeper.quorum=<ZK_QUORUM_ADDRESS> Example: *.sink.timeline.zookeeper.quorum=zk_host1:2181,zk_host2:2181,zk_host3:2181 to all the 4 files under /var/lib/ambari-server/resources/stacks/HDP/3.0/services/HIVE/package/templates/ located on Ambari Server host. Restart Ambari Server & Hive for changes to take effect. Now the metrics will be emitted and you should be able to see data on your Grafana Dashboard.

dineshc · ‎02-15-2019

@Mahesh Balakrishnan Since there can be only one accepted answer 😞 , I am sharing 25 bounty points with you. Thanks for the guidance.

gsharma · ‎12-07-2018

Done. Thanks for reporting.

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Re: Ranger WEB UI not loading.

Re: Could not start Ozone Manager because RANGER_O...

Scaling the HDFS NameNode (part 5)

Re: Hive - tez , vertex failed error during reduc...

Re: Spark DataFrame - difference between sort and ...

Re: Missing HBase metrics after HDP-3 Upgrade

Re: Beeline : Hive property to print current datab...

Missing Hive Metrics after HDP-3 upgrade

Re: Oozie Service Check fails after upgrading to ...

Re: Phoenix Index Basics - Part 1