Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 922 | 06-04-2025 11:36 PM | |
| 1522 | 03-23-2025 05:23 AM | |
| 751 | 03-17-2025 10:18 AM | |
| 2698 | 03-05-2025 01:34 PM | |
| 1800 | 03-03-2025 01:09 PM |
06-22-2018
06:27 PM
@Dassi Jean Fongang A datanode is a slave component of NameNode that shouldn't sit on the same host as the standby Namenode which is a Master process. With over 24 nodes you can afford to have at least 3 master nodes (Active & Standby NameNode ,Active & Standby RM and 3 zookepers) Can you ensure the client components ain't running? Get Datanode Info $ hdfs dfsadmin -getDatanodeInfo datanodex:50020 Trigger a block report for the given datanode $ hdfs dfsadmin -triggerBlockReport datanodex:50020 Can you update exclude file on both machine Resource manager and Namenode (Master Node), if it’s not there then we can create an exclude file on both the machines $ vi excludes Add the Datanode/Slave-node address, for decommissioning e.g 10.125.10.107 Run the following command in the Resource Manager: $ yarn rmadmin -refreshNodes
This command will basically check the yarn-site.xml and process that property.and decommission the mentioned node from yarn. It means now yarn manager will not give any job to this node. But the tricky part is even if all the datanodes components are not running deleting the datanode would impact your standby NameNode. I recommend you move the standby Namenode to another host that should ONLY have a DataNode and Node Manager or at most client software like ZK,Kerberos etc HTH
... View more
06-22-2018
01:16 PM
@tbroas I have raised a ticket (issue) but there is no response request (14748) for a printable certificate and request #9859 "HDPCA" but to date I have no response can you help Thanks a million
... View more
06-22-2018
01:01 PM
@JAy PaTel Can you share your sqoop code,please remember to obfuscate sensitive info?
... View more
06-21-2018
02:21 AM
@Chiranjeevi Nimmala During the re-install did you drop and recreate a fresh ranger database? If you created a new DB then Stop ranger then navigate to Ambari UI--->Ranger--->Configs--->Ranger Admin Ensure the following match your new DB/Host Ranger DB name Ranger DB username Ranger DB host Ranger DB password Driver class name for a JDBC Ranger database JDBC connect string for a Ranger database Ranger maintains policy related data in x_policy * tables ! to see which policies are enabled run the below SQL if you backend DB is Mysql else choose the appropriate syntax as the ranger DB owner mysql> select name,is_enabled from x_policy; Here is a link to manage the policies using REST APIs for Service Definition, Service and Policy Management HTH
... View more
06-19-2018
10:23 PM
@Vishal Gupta I don't think Cloudera Manager does install the KDC and the client automatically. Whether you are using Cloudera or Hortonworks you will first need to have a working KDC server(krb5-server) and Kerberos clients (krb5-workstation, krb5-libs) and realm setup in both cases you will either use the CM or Ambari Kerberos wizard. Having said that I provided a walkthrough to help you set up Kerberos on HDP, and consequent questions should be opened as new threads this revived quest dates from August 2017! 🙂 It will be great if you can marked this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue. HTH
... View more
06-19-2018
09:47 AM
@Ilia K The default Scheduler in HDP is Capacity Scheduler. You should note the differences between all the 3 settings Capacity Scheduler The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster. FIFO (First in First Out) Is the simplest scheduling algorithm. FIFO simply queues processes in the order that they arrive in the ready queue. Fair Scheduler Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time Having said that in your example above the PROD has 70% so despite "Each queue is used by multiple users and resources should be shared equally" the PROD queue jobs will have priority over the DEV queue which I think is the desired config. Can you share the below values: yarn.resourcemanager.scheduler.class Capacity Scheduler values? Please revert
... View more
06-13-2018
04:46 PM
@olivier brobecker Have you tried using the below snippet? Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now) $ hdfs fsck / | egrep -v '^\.+$' | grep -v replica Once you find a file that is corrupt $ hdfs fsck /path/to/corrupt/file -locations -blocks -files
... View more
06-13-2018
09:23 AM
@salma zegdene I see your problem was resolved if so can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue.
... View more
06-12-2018
05:45 PM
@Michael Bronson I think this should be linked to the Namenode issues, talking about that I have failed to reproduce the scenario but still investigating
... View more
06-12-2018
08:18 AM
@Michael Bronson The document I referenced should give you the steps to follow like analyzing the offending application etc if its the same cluster then it could be the Namenode: netstat -nape | awk ‘{if($5 ==“IR_of_amster01:2181”)print $4, $9;}’
Do the same for master03, then maybe use a bash script to kill the dead processes
... View more