About jromero

jromero · ‎07-21-2025

Can you share here any of the datanode logs, we can try to find what the problem in reaching out to the secondary namenode could be.

jromero · ‎07-20-2025

The issue seems to be in your secondary namenode: 2025-07-07 11:56:37,798 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Log not rolled. Name node is in safe mode. The reported blocks 0 has reached the threshold 0.9990 of total blocks 0. The number of live datanodes 0 needs an additional 1 live datanodes to reach the minimum number 1. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:dmidlkprdls01.svr.luc.edu It looks like the namenode can't communicate with your datanodes, hence it can't come out of safemode and crashes. Maybe there's some network problem that don't allow the communication between those 2 roles? Can you ping the secondary namenode from your datanodes and vice-versa? Are the required ports open on secondary namenode and datanodes?

jromero · ‎07-18-2025

I meant the namenode process logs. If you didn't customize the location, it should be under /var/log/hadood-hdfs, then you will see a bunch of logs. Get the latest one that says NAMENODE (it's in caps) and if possible share it here. Get them from both namenode servers please.

jromero · ‎07-17-2025

The logs from both namenode servers, to investigate why the checkpointing process is failing

jromero · ‎07-16-2025

Hello @jkoral The log snippet you posted is not enough for us to identify the problem. Those messages about: Not enough replicas was chosen They are mostly harmless, and although annoying, don't pose a threat to the process. Is it possible for you to share the logs from both namenodes to check?

jromero · ‎07-02-2025

Hello @rizalt Thanks for posting your questions on Cloudera Community forum! If I understood well, your space usage is high although it seems you don't have any data in hdfs at all. For confirmation, could you please run this command against hdfs (make sure to have a kerberos ticket if the cluster is kerberized: hdfs dfs -du -h / or show me from the browse directory page, the root folder. Finally, to answer your immediate question, please don't remove any data from /hadoop/hdfs/data. It is preferred to remove things using the proper tools. We will walk you through it once we get the above information. Regards, JR

jromero · ‎11-18-2024

Hello @sayebogbon , Based on the error in the log you shared: opened_url = urlopen_with_retry_on_authentication_errors And the klist output showing this: Valid starting Expires 10/11/24 23:43:47 11/11/24 09:43:47 Looks like you need to regenerate the kerberos credentials for this host. To do so, please stop all services on this host. Then go to CM > Administration > Security > Kerberos credentials. In the search bar, type the hostname and select all the principals that appear, then click the regenerate selected button. If there are no problems, new credentials should be generated. Restart your services and let us know if that helps.

jromero · ‎11-30-2023

Hello Olivier, The error you got "org.apache.kudu.client.NonRecoverableException: Unauthorized action" happens when the account you use to take the backup don't have the proper permissions to access the kudu tables. Do you happen to use ranger to authorize access on kudu? If so, then maybe the account you are using is not authorized on ranger. It could be as well that you are running it with the wrong kerberos principal? Regards, Jason R

jromero · ‎05-17-2022

Hello yagoaparecidoti When you start the tablet servers, kudu needs to read all of the block containers located in that tablet server. Depending on how much data is stored there, is the amount of time it can take to open all the containers. Several variables are also involved such as disk performance, how busy is the host doing other things, if the disks are shared, etc... If it happens to only one host, chances are this host is either facing some hardware issues or have a lot of data to process, otherwise you only have a lot of block containers to work with. Please let us know if there are further questions we can clarify for you. Jason

jromero · ‎03-30-2022

Hi @yagoaparecidoti Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers. If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked. > Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas? Can you compare those with a non affected TS?

Online	Offline
Last Visited	‎11-20-2025 08:11 AM

Member Since	‎04-20-2020 11:03 AM
Last Visited	‎11-20-2025 08:11 AM
Posts	246
Kudos received	5

Cloudera Community

Re: kudu tablet server slow to start

Re: Kudu Tablet taking too long to sync with Kudu ...

Re: kudu connection

Re: HDFS Checkpoint Status Errors

Re: HDFS Checkpoint Status Errors

Re: HDFS Checkpoint Status Errors

Re: HDFS Checkpoint Status Errors

Re: HDFS Checkpoint Status Errors

Re: Cleaning data HDFS

Re: Cloudera Manager Agent is not able to communic...

Re: KUDU backup not working on kerberized cluster

Re: kudu tablet server slow to start

Re: Kudu Tablet taking too long to sync with Kudu ...