About Shelton

Shelton · ‎08-26-2019

@iamabug Are you now comfortable proceeding? If you need some help don't hesitate to ask.

Shelton · ‎08-25-2019

@Manoj690 Going through your logs I can see that the Namenode is in SAFE MODE, and in this case it won't allow you to change the status of any file in the cluster including the logs. 2019-08-22 12:31:01,376 [server.Accumulo] INFO : Attempting to talk to zookeeper 2019-08-22 12:31:01,681 [server.Accumulo] INFO : ZooKeeper connected and initialized, attempting to talk to HDFS 2019-08-22 12:31:01,946 [server.Accumulo] WARN : Waiting for the NameNode to leave safemode 2019-08-22 12:31:01,946 [server.Accumulo] INFO : Backing off due to failure; current sleep period is 1.0 seconds 2019-08-22 12:31:02,950 [server.Accumulo] WARN : Waiting for the NameNode to leave safemode 2019-08-22 12:31:02,950 [server.Accumulo] INFO : Backing off due to failure; current sleep period is 2.0 seconds 2019-08-22 12:31:04,954 [server.Accumulo] WARN : Waiting for the NameNode to leave safemode To resolve the issue can you do the following as hdfs user $ hdfs dfsadmin -safemode get Safe mode is OFF The above is the desired output but if you get ON then proceed like below First backup your FS Image & Edits $ hdfs dfsadmin -saveNamespace Then exit the safemode $ hdfs dfsadmin -safemode leave Once successful then revalidate $ hdfs dfsadmin -safemode get This time it should be off and you can now successfully restart the failed services from Ambari everything should succeed HTH

Shelton · ‎08-25-2019

@iamabug There is a lot more than just kerberizing the cluster and you are good to go. Have you enabled SSL also? Can you share a tokenized version of the below files? Basically, the ACL in zk is the key to who can do what and usually the Kafka admin is the only one allowed! server.properties [listeners, advertised.listeners,authorizer.class.name,sasl.enabled.mechanism and super.users] Kafka_server_jaas.conf Kafka_client_jaas.conf kafka_client_kerberos.properties Hope that helps

Shelton · ‎08-25-2019

@pritam_konar In reality a user shouldn't be able to execute or kinit with the hdfs keytab but have a keytab created for the specific user and when need be deleted when the user is disabled on the cluster typically this user setup happens on the edge node where the hadoop client software are installed and is the recommended setup for giving users access to the cluster. Below is a demo of the user konar when he attempts to access services in a kerberized cluster # su - konar [konar@simba ~]$ id uid=1024(konar) gid=1024(konar) groups=1024(konar) Now try to list the directories in HDFS [konar@simba ~]$ hdfs dfs -ls / Error 19/08/24 23:59:25 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "simba.kenya.ke/192.168.0.87"; destination host is: "simba.kenya.ke":8020; Below is the desired output when the user konar attempts to use the hdfs headless keytab, [konar@simba ~]$ kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-jair@KENYA.KE kinit: Permission denied while getting initial credentials To enable a user to access the cluster, on the Kerberos server as the root (Keberos admin) do the following steps, Assumption Realm is KENYA.KE and KDC host is simba and you have root access on the KDC. Create the admin principle for user konar [root@simba ~]# kadmin.local Authenticating as principal root/admin@KENYA.KE with password. kadmin.local: addprinc konar@KENYA.KE WARNING: no policy specified for konar@KENYA.KE; defaulting to no policy Enter password for principal "konar@KENYA.KE": Re-enter password for principal "konar@KENYA.KE": Principal "konar@KENYA.KE" created. kadmin.local: q Validate the principal was created using the subcommand listprincs [List principals] and limiting the output by restricting to konar classic Unix stuff [root@simba ~]# kadmin.local Authenticating as principal root/admin@KENYA.KE with password. kadmin.local: listprincs *konar konar@KENYA.KE Type q [quit] to exit the kadmin utility Generate the keytab Generate keytab for user konar using the ktutil, it's good to change to /tmp or whatever you choose so you know the location of the generated keytab your encryption Algorithm could be different but this should work [root@simba tmp]# ktutil ktutil: addent -password -p konar@KENYA.KE -k 1 -e RC4-HMAC Password for konar@KENYA.KE: ktutil: wkt konar.keytab ktutil: q Validate the keytab creation Check the keytab was generated in the current directory, notice the file permissions!! [root@simba tmp]# ls -lrt -rw------- 1 root root 58 Aug 25 18:22 konar.keytab As root copy the generate keytab to the home directory of user konar typically on the edge node [root@simba tmp]# cp konar.keytab /home/konar/ Change to konar's home dir and vaildate the copy was successful [root@simba tmp]# cd /home/konar/ [root@simba konar]# ll total 4 -rw------- 1 root root 58 Aug 25 18:28 konar.keytab Change file ownership Change the file permission on the konar.keytab so that user konar has the appropriate permissions. [root@simba konar]# chown konar:konar konar.keytab [root@simba konar]# ll total 4 -rw------- 1 konar konar 58 Aug 25 18:28 konar.keytab Switch to user konar and validate that the user has can't still access to hdfs $ hdfs dfs -ls / [konar@simba ~]$ hdfs dfs -ls / Output 19/08/25 18:36:44 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "simba.kenya.ke/192.168.0.87"; destination host is: "simba.kenya.ke":8020; The kerberos klist also confirms that [konar@simba ~]$ klist klist: No credentials cache found (filename: /tmp/krb5cc_1024) As user Konar now try to kinit with the correct principal, the first step is to identify the correct principal [konar@simba ~]$ klist -kt konar.keytab Keytab name: FILE:konar.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 08/25/2019 18:22:34 konar@KENYA.KE The above shows the konar user keytab is valid with the principal in the output Now user konar can grab a valid ticket ûsing the below snippet concatenating the keytab + principal [konar@simba ~]$ kinit -kt konar.keytab konar@KENYA.KE The above should throw any error Now validate the user has a valid ticket [konar@simba ~]$ klist Ticket cache: FILE:/tmp/krb5cc_1024 Default principal: konar@KENYA.KE Valid starting Expires Service principal 08/25/2019 18:53:40 08/26/2019 18:53:40 krbtgt/KENYA.KE@KENYA.KE Bravo you have a valid ticket and hence access to the cluster let's validate that the below HDFS list directory should succeed [konar@simba ~]$ hdfs dfs -ls / Found 10 items drwxrwxrwx - yarn hadoop 0 2018-12-17 21:53 /app-logs drwxr-xr-x - hdfs hdfs 0 2018-09-24 00:22 /apps drwxr-xr-x - yarn hadoop 0 2018-09-24 00:12 /ats drwxr-xr-x - hdfs hdfs 0 2018-09-24 00:12 /hdp drwxr-xr-x - mapred hdfs 0 2018-09-24 00:12 /mapred drwxrwxrwx - mapred hadoop 0 2018-09-24 00:12 /mr-history drwxr-xr-x - hdfs hdfs 0 2018-12-17 19:16 /ranger drwxrwxrwx - spark hadoop 0 2019-08-25 18:59 /spark2-history drwxrwxrwx - hdfs hdfs 0 2018-10-11 11:16 /tmp drwxr-xr-x - hdfs hdfs 0 2018-09-24 00:23 /user User konar can now list and execute jobs on the cluster !!!! as reiterated the konar user in a recommended architecture should be on the edge node.

Shelton · ‎08-18-2019

@ray_teruya If you found this answer addressed your question, please take a moment to log in and click the "kudos" link on the answer. That would be a great help to Community users to find the solution quickly for these kinds of errors.

Shelton · ‎08-14-2019

@Malthe Borch Nifi cluster coordinator relays on the zookeeper for the election. NiFi employs a Zero-Master Clustering paradigm. Each node in the cluster performs the same tasks on the data, but each operates on a different set of data. One of the nodes is automatically elected (via Apache ZooKeeper) as the Cluster Coordinator. All nodes in the cluster will then send heartbeat/status information to this node, and this node is responsible for disconnecting nodes that do not report any heartbeat status for some amount of time. Additionally, when a new node elects to join the cluster, the new node must first connect to the currently-elected Cluster Coordinator in order to obtain the most up-to-date flow. If the Cluster Coordinator determines that the node is allowed to join (based on its configured Firewall file), the current flow is provided to that node, and that node is able to join the cluster, assuming that the node’s copy of the flow matches the copy provided by the Cluster Coordinator. If the node’s version of the flow configuration differs from that of the Cluster Coordinator’s, the node will not join the cluster. Checklist Ensure you have 3 zookeepers (ensemble) to manage your Nifi cluster, and all should be up and running. Walkthrough Stop all the Nifi instances Ensure the 3 zookeepers are up and running Start the Nifi one at a time Validate that one nifi has been elected coordinator as is PRIMARY See screenshots Coordinator elected HTH

Shelton · ‎08-11-2019

@Ray Teruya The error in BOLD below is what I stated in Question/Answer 2 in my former post. To avoid the split-brain decision you MUST install 3 zookeepers 2019-07-31 07:57:58,191 - WARN [main:QuorumPeerConfig@291] - No server failure will be tolerated. You need at least 3 servers. Solution Delete/remove the failed installation. Add 2 new zk using Ambari UI in your cluster using ADD SERVICE, start the new zookeepers if they ain't started, this should form a quorum where only one is a leader and the rest are followers. To identify a Zookeeper leader/follower, there are few possible options. Mentioning 2 for keeping this document simple. 2. Use "nc" command to listen to TCP communication on port 2181 and determine if the ZooKeeper server is a leader or a follower. 1. Check the zookeeper log file on each node, and grep as below: # grep LEAD /var/log/zookeeper/zookeeper-zookeeper-server-xyz.out Desired output 2019-08-10 22:33:47,113 - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@829] - LEADING 2019-08-10 22:33:47,114 - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@358] - LEADING - LEADER ELECTION TOOK - 9066 After doing the above procedure you should be good to go. HTH

Shelton · ‎08-04-2019

@Thomas Poetter it seems you DB isn't running can you just check as below can you ensure you install the mysql connector you can re-run this without any issues yum install -y mysql-connector-java Check MySQL status # ps aux | grep mysql or # mysqladmin -u root -p status Enter password: Uptime: 1697 Threads: 15 Questions: 17283 Slow queries: 0 Opens: 73 Flush tables: 2 Open tables: 99 Queries per second avg: 10.184 if not then set it up manually as the root user please re-adapt create database rangerkms; grant all privileges on rangerkms.* to rangerkms@'localhost' identified by 'rangerkms'; grant all privileges on rangerkms.* to rangerkms@'%.[your_domain]' identified by 'rangerkms'; Then retry

Shelton · ‎08-04-2019

@FA Great that your problem was resolved.

Shelton · ‎08-03-2019

@Ray Teruya OutOfMemoryError is a subclass of java.lang.VirtualMachineError; it’s thrown by the JVM when it encounters a problem related to utilizing resources. More specifically, the error occurs when the JVM spent too much time performing Garbage Collection and was only able to reclaim very little heap space. According to Java docs, by default, the JVM is configured to throw this error if the Java process spends more than 98% of its time doing GC and when only less than 2% of the heap is recovered in each run. In other words, this means that our application has exhausted nearly all the available memory and the Garbage Collector has spent too much time trying to clean it and failed repeatedly. In this situation, users experience extreme slowness of the application. Certain operations, which usually complete in milliseconds, take more time to complete. This is because the CPU is using its entire capacity for Garbage Collection and hence cannot perform any other tasks. Solution: On HDP 3.x & 2.6.x depending on the memory available to the cluster check and increase the below You could throttle it to 2048 MB HTH

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Topic creation and deletion are not protected ...

Re: in ambari accumulo not started

Re: Topic creation and deletion are not protected ...

Re: HDFS is not accessible from an user after kerb...

Re: YARN - Zookeeper failing a few moments after r...

Re: All Nodes are disconnected from NiFI Cluster

Re: YARN - Zookeeper failing a few moments after r...

Re: Ambari 2.7.3.0-139: SQLException: SQL state: 0...

Re: Replace /N with empty string in CSV File

Re: YARN - Zookeeper failing a few moments after r...