Member since
07-21-2016
101
Posts
10
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4692 | 02-15-2020 05:19 PM | |
81845 | 10-02-2017 08:22 PM | |
1979 | 09-28-2017 01:55 PM | |
2181 | 07-25-2016 04:09 PM |
03-31-2017
01:51 PM
1 Kudo
@Kumar Veerappan Ambari provides some inbuilt alerts to findout the Weekly/Daily growth in the HDFS usage.
Ambari UI --> Alerts (Tab) --> "Alert Definition Filter" Search for "HDFS Storage Capacity Usage"
This service-level alert is triggered if the increase in storage capacity usage deviation has grown beyond the specified threshold within a given period. This alert will monitor Daily and Weekly periods. Please see: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user-guide/content/hdfs_service_alerts.html However if you want to get this data for 6 months then you might have to write your own custom alert script. \Sometime back i have written a basic example of how we can have our own custom ambari alert.
https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html
Grafana basically fetches data from AMS (Ambari Metrics Collector) using APIs so the data need to be available on AMS first.
... View more
03-07-2017
08:14 PM
1 Kudo
I suggest you look at this article https://community.hortonworks.com/articles/74335/ambari-server-performance-monitoring-alerts.html It demonstrates how to poll ambari server performance. This will tell you if ambari is up or down and if it is performing well.
... View more
12-20-2016
01:17 PM
2 Kudos
One key benefit is that passwords or keys are sent across the network as infrequently as possible with Kerberos. With SSH either passwords are being transmitted or you are persisting files with secret keys, both of which have concerns for security. This article does a great job comparing and contrasting SSH and Kerberos: http://docstore.mik.ua/orelly/networking_2ndEd/ssh/ch11_04.htm "When a user identifies herself to the Kerberos system, the identifying program (kinit) uses her password for an exchange with the KDC, then immediately erases it, never having sent it over the network in any form nor stored it on disk."
... View more
01-28-2019
08:41 PM
I also noticed you can monitor the "need to move" message for the remaining space to be balanced. This can go up or down depending on how busy the cluster is: cat /tmp/hdfs_rebalancer.log | grep "Need to move" | tail -n 10
19/01/28 12:23:02 INFO balancer.Balancer: Need to move 11.11 TB to make the cluster balanced.
19/01/28 12:43:48 INFO balancer.Balancer: Need to move 11.10 TB to make the cluster balanced.
19/01/28 13:04:38 INFO balancer.Balancer: Need to move 10.89 TB to make the cluster balanced.
19/01/28 13:25:23 INFO balancer.Balancer: Need to move 10.83 TB to make the cluster balanced.
19/01/28 13:45:59 INFO balancer.Balancer: Need to move 10.83 TB to make the cluster balanced.
19/01/28 14:06:30 INFO balancer.Balancer: Need to move 10.78 TB to make the cluster balanced.
19/01/28 14:27:14 INFO balancer.Balancer: Need to move 10.73 TB to make the cluster balanced.
19/01/28 14:47:53 INFO balancer.Balancer: Need to move 10.70 TB to make the cluster balanced.
19/01/28 15:08:42 INFO balancer.Balancer: Need to move 10.66 TB to make the cluster balanced.
19/01/28 15:29:23 INFO balancer.Balancer: Need to move 10.75 TB to make the cluster balanced.
... View more
12-06-2016
04:21 PM
1 Kudo
@Kumar Veerappan - Via Ambari or Command line. In both ways the same command will be used. The actual command will be : https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer hdfs balancer -threshold . - Ambari will use the following python script and the command: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py#L315-L317 def startRebalancingProcess(threshold):
rebalanceCommand = 'hdfs balancer -threshold %s' % threshold
return ['cmd', '/C', rebalanceCommand] - Via command line you will get some additional options to be passed the the "hdfs balancer" command which will allow to get more control on it. hdfs balancer
[-threshold <threshold>]
[-policy <policy>]
[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
[-include [-f <hosts-file> | <comma-separated list of hosts>]]
[-idleiterations <idleiterations>] .
... View more
04-05-2017
03:12 PM
@ Constantin Stanca I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:
Data Node - safely replicates the
HDFS data to other DNs Node Manager - stop accepting new job
requests Region Server - turns on drain mode In a urgent situation, I could agree on your suggestion. However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.
... View more
08-26-2016
07:23 PM
3 Kudos
@Kumar Veerappan Your question caption asked about dependent components. Your question description asked about list of jobs that use spark currently. I assume that you mean you actually meant spark applications (AKA jobs) running on the cluster. If you have access to Ambari, you could click on Yarn link then on Quick Links and then on Resource Manager UI. That assumes your Spark runs over Yarn. Otherwise, you could go directly to Resource Manager UI. You would need to know the IP Address of the server where ResourceManager runs, as well as the port. Default is 8088.
... View more
08-23-2016
10:28 PM
3 Kudos
@Kumar Veerappana
Assuming that you are only interested who has access to Hadoop services, extract all OS users from all nodes by checking /etc/passwd file content. Some of them are legitimate users needed by Hadoop tools, e.g. hive, hdfs, etc.For hdfs, they will have a /user/username folder in hdfs. You can see that with hadoop -fs ls -l /user executed as a user member of the hadoop group. If they have access to hive client, they are able to also perform DDL and DML actions in Hive. The above will allow you to understand the current state, however, this is your opportunity to improve security even without the bells and whistles of Kerberos/LDAP/Ranger. You can force the users to access Hadoop ecosystem client services via a few client/edge nodes, where only client services are running, e.g. Hive client. Users, other than power users, should not have accounts on name node, admin node or data nodes. Any user that can access those nodes where client services are running can access those services, e.g. hdfs or Hive.
... View more
- « Previous
- Next »