About kums

jsensharma · ‎03-31-2017

@Kumar Veerappan Ambari provides some inbuilt alerts to findout the Weekly/Daily growth in the HDFS usage. Ambari UI --> Alerts (Tab) --> "Alert Definition Filter" Search for "HDFS Storage Capacity Usage" This service-level alert is triggered if the increase in storage capacity usage deviation has grown beyond the specified threshold within a given period. This alert will monitor Daily and Weekly periods. Please see: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/hdfs_service_alerts.html However if you want to get this data for 6 months then you might have to write your own custom alert script. \Sometime back i have written a basic example of how we can have our own custom ambari alert. https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html Grafana basically fetches data from AMS (Ambari Metrics Collector) using APIs so the data need to be available on AMS first.

sunile_manjee · ‎03-07-2017

I suggest you look at this article https://community.hortonworks.com/articles/74335/ambari-server-performance-monitoring-alerts.html It demonstrates how to poll ambari server performance. This will tell you if ambari is up or down and if it is performing well.

kums · ‎03-02-2017

Thank you @Arpit Agarwal

clukasik · ‎12-20-2016

One key benefit is that passwords or keys are sent across the network as infrequently as possible with Kerberos. With SSH either passwords are being transmitted or you are persisting files with secret keys, both of which have concerns for security. This article does a great job comparing and contrasting SSH and Kerberos: http://docstore.mik.ua/orelly/networking_2ndEd/ssh/ch11_04.htm "When a user identifies herself to the Kerberos system, the identifying program (kinit) uses her password for an exchange with the KDC, then immediately erases it, never having sent it over the network in any form nor stored it on disk."

mtdeguzis · ‎01-28-2019

I also noticed you can monitor the "need to move" message for the remaining space to be balanced. This can go up or down depending on how busy the cluster is: cat /tmp/hdfs_rebalancer.log | grep "Need to move" | tail -n 10 19/01/28 12:23:02 INFO balancer.Balancer: Need to move 11.11 TB to make the cluster balanced. 19/01/28 12:43:48 INFO balancer.Balancer: Need to move 11.10 TB to make the cluster balanced. 19/01/28 13:04:38 INFO balancer.Balancer: Need to move 10.89 TB to make the cluster balanced. 19/01/28 13:25:23 INFO balancer.Balancer: Need to move 10.83 TB to make the cluster balanced. 19/01/28 13:45:59 INFO balancer.Balancer: Need to move 10.83 TB to make the cluster balanced. 19/01/28 14:06:30 INFO balancer.Balancer: Need to move 10.78 TB to make the cluster balanced. 19/01/28 14:27:14 INFO balancer.Balancer: Need to move 10.73 TB to make the cluster balanced. 19/01/28 14:47:53 INFO balancer.Balancer: Need to move 10.70 TB to make the cluster balanced. 19/01/28 15:08:42 INFO balancer.Balancer: Need to move 10.66 TB to make the cluster balanced. 19/01/28 15:29:23 INFO balancer.Balancer: Need to move 10.75 TB to make the cluster balanced.

Former Member · ‎12-06-2016

@Kumar Veerappan - Via Ambari or Command line. In both ways the same command will be used. The actual command will be : https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer hdfs balancer -threshold . - Ambari will use the following python script and the command: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py#L315-L317 def startRebalancingProcess(threshold): rebalanceCommand = 'hdfs balancer -threshold %s' % threshold return ['cmd', '/C', rebalanceCommand] - Via command line you will get some additional options to be passed the the "hdfs balancer" command which will allow to get more control on it. hdfs balancer [-threshold <threshold>] [-policy <policy>] [-exclude [-f <hosts-file> | <comma-separated list of hosts>]] [-include [-f <hosts-file> | <comma-separated list of hosts>]] [-idleiterations <idleiterations>] .

pd47 · ‎04-05-2017

@ Constantin Stanca I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks: Data Node - safely replicates the HDFS data to other DNs Node Manager - stop accepting new job requests Region Server - turns on drain mode In a urgent situation, I could agree on your suggestion. However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.

cstanca · ‎08-26-2016

@Kumar Veerappan Your question caption asked about dependent components. Your question description asked about list of jobs that use spark currently. I assume that you mean you actually meant spark applications (AKA jobs) running on the cluster. If you have access to Ambari, you could click on Yarn link then on Quick Links and then on Resource Manager UI. That assumes your Spark runs over Yarn. Otherwise, you could go directly to Resource Manager UI. You would need to know the IP Address of the server where ResourceManager runs, as well as the port. Default is 8088.

cstanca · ‎08-23-2016

@Kumar Veerappana Assuming that you are only interested who has access to Hadoop services, extract all OS users from all nodes by checking /etc/passwd file content. Some of them are legitimate users needed by Hadoop tools, e.g. hive, hdfs, etc.For hdfs, they will have a /user/username folder in hdfs. You can see that with hadoop -fs ls -l /user executed as a user member of the hadoop group. If they have access to hive client, they are able to also perform DDL and DML actions in Hive. The above will allow you to understand the current state, however, this is your opportunity to improve security even without the bells and whistles of Kerberos/LDAP/Ranger. You can force the users to access Hadoop ecosystem client services via a few client/edge nodes, where only client services are running, e.g. Hive client. Users, other than power users, should not have accounts on name node, admin node or data nodes. Any user that can access those nodes where client services are running can access those services, e.g. hdfs or Hive.

kums · ‎08-18-2016

@lraheja thank you very much

Online	Offline
Last Visited	‎05-04-2020 01:01 AM

Member Since	‎07-21-2016 02:20 PM
Last Visited	‎05-04-2020 01:01 AM
Posts	101
Kudos received	10

Cloudera Community

Re: Active Name Node not coming up

Re: klist: no credentials cache found

Re: Name Node services fail over

Re: Unable to run Hadoop commands

Re: Rate of growth HDFS Filesystem

Re: Monitoring Ambari server

Re: Ambari Alert - DataNode heapsize alert

Re: Use case for Implementing Kerberos

Re: Question on HDFS Rebalance

Re: Question on HDFS rebalance

Re: Data Node maintenance

Re: Spark Upgrade - How to get dependent component...

Re: How to get the list of users

Re: How to find the node where ambari server runs