Created on 04-11-2022 03:02 AM - edited 09-16-2022 07:45 AM
Hello everyone,
I'm going to ask for your support regarding a problem we're having today. We also have HDFS DELEGATION TOKEN problem in RA cluster, if some of our controllers (you can accept it as an application) are not restarted within 7 days, the tasks they run fail. Actually, there is a similar situation here, but it looks a little different than what we experience today. Although the controller restarted yesterday, we observed perpetrators with the same error today. After the controller is restarted, the tasks are completed when they are restarted. I am sending the klist outputs from the .4 server, the first one is between 12:45-01:00pm, the next one is around 01:45pm, why did the situation occur here, can I ask for your support?
Cloudera Version 6.3.3
Created 04-22-2022 02:55 AM
Hi @reca.,
May I know have you specified Kerberos principal and keytab in your Flume conf:
https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_sg_use_subs_vars_s11.html
If you have many long running jobs we would recommend you to increase default HDFS Delegation token max lifetime and renew time:
dfs.namenode.delegation.token.max-lifetime 604800000 (7days) -> increase to 30 days dfs.namenode.delegation.token.renew-interval 86400000 (1days) -> increase to 30 days
You can set max-lifetime to even 1 year and the renew interval just need to be equal or smaller than max-lifetime.
Created 04-28-2022 01:23 AM
@reca Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,