Member since
04-20-2020
246
Posts
6
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1255 | 05-17-2022 02:39 PM | |
4043 | 03-30-2022 12:27 PM | |
2646 | 03-28-2022 11:48 AM |
07-21-2025
11:35 AM
Can you share here any of the datanode logs, we can try to find what the problem in reaching out to the secondary namenode could be.
... View more
07-20-2025
10:02 PM
The issue seems to be in your secondary namenode: 2025-07-07 11:56:37,798 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Log not rolled. Name node is in safe mode.
The reported blocks 0 has reached the threshold 0.9990 of total blocks 0. The number of live datanodes 0 needs an additional 1 live datanodes to reach the minimum number 1.
Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:dmidlkprdls01.svr.luc.edu It looks like the namenode can't communicate with your datanodes, hence it can't come out of safemode and crashes. Maybe there's some network problem that don't allow the communication between those 2 roles? Can you ping the secondary namenode from your datanodes and vice-versa? Are the required ports open on secondary namenode and datanodes?
... View more
07-18-2025
08:35 AM
I meant the namenode process logs. If you didn't customize the location, it should be under /var/log/hadood-hdfs, then you will see a bunch of logs. Get the latest one that says NAMENODE (it's in caps) and if possible share it here. Get them from both namenode servers please.
... View more
07-17-2025
09:49 AM
The logs from both namenode servers, to investigate why the checkpointing process is failing
... View more
07-16-2025
10:13 AM
Hello @jkoral The log snippet you posted is not enough for us to identify the problem. Those messages about: Not enough replicas was chosen They are mostly harmless, and although annoying, don't pose a threat to the process. Is it possible for you to share the logs from both namenodes to check?
... View more
07-02-2025
12:58 PM
Hello @rizalt Thanks for posting your questions on Cloudera Community forum! If I understood well, your space usage is high although it seems you don't have any data in hdfs at all. For confirmation, could you please run this command against hdfs (make sure to have a kerberos ticket if the cluster is kerberized: hdfs dfs -du -h / or show me from the browse directory page, the root folder. Finally, to answer your immediate question, please don't remove any data from /hadoop/hdfs/data. It is preferred to remove things using the proper tools. We will walk you through it once we get the above information. Regards, JR
... View more
11-18-2024
08:41 AM
1 Kudo
Hello @sayebogbon , Based on the error in the log you shared: opened_url = urlopen_with_retry_on_authentication_errors And the klist output showing this: Valid starting Expires
10/11/24 23:43:47 11/11/24 09:43:47 Looks like you need to regenerate the kerberos credentials for this host. To do so, please stop all services on this host. Then go to CM > Administration > Security > Kerberos credentials. In the search bar, type the hostname and select all the principals that appear, then click the regenerate selected button. If there are no problems, new credentials should be generated. Restart your services and let us know if that helps.
... View more
11-30-2023
09:25 AM
Hello Olivier, The error you got "org.apache.kudu.client.NonRecoverableException: Unauthorized action" happens when the account you use to take the backup don't have the proper permissions to access the kudu tables. Do you happen to use ranger to authorize access on kudu? If so, then maybe the account you are using is not authorized on ranger. It could be as well that you are running it with the wrong kerberos principal? Regards, Jason R
... View more
05-17-2022
02:39 PM
1 Kudo
Hello yagoaparecidoti When you start the tablet servers, kudu needs to read all of the block containers located in that tablet server. Depending on how much data is stored there, is the amount of time it can take to open all the containers. Several variables are also involved such as disk performance, how busy is the host doing other things, if the disks are shared, etc... If it happens to only one host, chances are this host is either facing some hardware issues or have a lot of data to process, otherwise you only have a lot of block containers to work with. Please let us know if there are further questions we can clarify for you. Jason
... View more
03-30-2022
12:27 PM
1 Kudo
Hi @yagoaparecidoti Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers. If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked. > Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas? Can you compare those with a non affected TS?
... View more