Support Questions
Find answers, ask questions, and share your expertise

Nifi PutHDFS Processor stuck and not able to release item from queue.

We have been facing an intermittent issue in our QA env, where PutHDFS processor goes stuck and not able to release item from upstream processor queue. This does not look related to load as the no.of messages in queue and queue size was not too high. We tried to stop/start the processor but this does not work as after stopping the processor we don't see the start button on PutHDFS. The only way we are currently fixing this is by restarting the nifi which we would like to avoid in Prod env.

At the time of stuck No of threads we can see on UI 2 and processor is configured with default 1 parallel level. In order to further drill down we tried to take multiple thread-dumps and notices that a particular thread was always blocked with same stack trace and linked to PutHDFS processor.

Stack trace is as below.

Timer-Driven Process Thread-2" Id=106 BLOCKED on java.io.BufferedInputStream@b7146b3 at java.io.BufferedInputStream.read(BufferedInputStream.java:336) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)

-------

at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:255) at java.security.AccessController.doPrivileged(Native Method)

puthdfs-stuck.jpgthread-dump.txtthread-dump-2.txtthread-dump-3.txt

I am attaching the UI screen-shot at time of stuck and thread-dumps for reference.

Can someone please help us in finding the root-cause for this ?

7 REPLIES 7

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Rising Star

@Tarun Kumar

I have been investigating this issue for more than a few days, but so far I have not been able to reproduce it in any environment to which I have access. What you're seeing is the end result of a set of conditions that cause the JAAS configuration and previously authenticated principal to be contextually lost, resulting in Krb5LoginModule.promptForName to interactively prompt for the principal name.

If you wouldn't mind sharing details about your configuration, we can work together to diagnose what's causing PutHDFS to fail. Could you please answer the following questions:

  • What values are set for the ticket lifetime and ticket renewal lifetime in the KDC for the principal you have set in PutHDFS?
  • What values are set for the ticket lifetime and ticket renewal liftetime in the krb5.conf that you have set for NiFi in Ambari?
  • Does this issue occur consistently? Would it seem to happen around the time that the principal's kerberos ticket would be getting renewed, some time between 80% and 100% of the ticket lifetime?
  • How often are files sent to PutHDFS incoming queue? On a regular interval, or is it sporadic?
  • What is the "Relogin Period" property set to in PutHDFS' configuration?

There are a few settings you can add/change for help provide more information to debug the issue.

  • Add the following line to NiFi's Advanced nifi-env config site in Ambari to allow Hadoop to log debug info regarding JAAS
export HADOOP_JAAS_DEBUG=true
  • Add the following line to NiFi's logback config in Ambari to allow Hadoop to log debug info regarding Hadoop privileged operations
<logger name="org.apache.hadoop.security" level="DEBUG"/>
  • Add the following line to NiFi's Advanced nifi-bootstrap config site to enable krb5 debug on the cluster (you can change java.arg.100 to java.arg.somenumber, as long as there's no other entry with that number elsewhere in the bootstrap config)
java.arg.100=-Dsun.security.krb5.debug=true

Please provide nifi-app and nifi-boostrap logs after restarting NiFi and observing the stuck threads, and I'll take a look at them.

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Thank You @Jeff Storck for the reply. Please see below response inline.

What values are set for the ticket lifetime and ticket renewal lifetime in the KDC for the principal you have set in PutHDFS?

  • TK: Not sure of KDC setting, Where shall I check these?

What values are set for the ticket lifetime and ticket renewal liftetime in the krb5.conf that you have set for NiFi in Ambari?

  • TK: Ticket_Lifetime: 24h , renew_lifetime:7d

Does this issue occur consistently? Would it seem to happen around the time that the principal's kerberos ticket would be getting renewed, some time between 80% and 100% of the ticket lifetime?

  • TK: This issue is coming intermittently and can’t relate any pattern.

How often are files sent to PutHDFS incoming queue? On a regular interval, or is it sporadic?

  • TK: Don’t think we have a regular interval for files to be sent .

What is the "Relogin Period" property set to in PutHDFS' configuration?

  • TK: Default 4 hours

Unfortunately don't have nifi-app and bootstarp.logs for QA env issue when it occurred last time.But we also notice this on Dev env with FetchHDFS Processor, attached are thread-dumps and logs for that time.dev-fetchhdfs-stuck.jpgthread-dump-1.txtthread-dump-2.txt

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Hello @Tarun Kumar. Set this system property 'javax.security.auth.useSubjectCredsOnly' to true.

To configure it this way in NiFi you can add this line, for example, to your nifi/conf/bootstrap.conf file.

java.arg.101=-Djavax.security.auth.useSubjectCredsOnly=true

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Thank You @jwitt . Please confirm is this suggestion as a recommended fix for stuck issue with HDFS-Processors in nifi in the context of above thread?

Please also help with some relevant information/link in this regard to relate this issue.

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Yes this solves the original issue of this thread (promptForName). What is happening is the JDK/JRE security code is allowing the search for other methods to obtain the principal in a condition where a failure has occurred and a retry is being blocked most likely due to insufficient time. We've spent a considerable amount of time debugging this condition.

The link to the system property that explains its meaning/role is here https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/single-signon.html. Specifically read the 'Exceptions to the Model' case where this property is described.

Doing this will ensure the JDK/JRE does not attempt any methods/mechanisms other than what we've said we want and specifically it avoids the scenario where it would try to prompt for the user to supply a name at the command prompt which would obviously never work and worse yet when that happens our thread is stuck until a restart.

So, yes, add this system property and you should be in far better shape with regard to the prompt for name issue.

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Thank You very much @jwitt for useful insights with above statements.

Re: Nifi PutHDFS Processor stuck and not able to release item from queue.

Rising Star
@Tarun Kumar

To follow up on your question, with the release of HDF 3.1, the issues with promptForName/stuck threads in Hadoop components should be resolved. In addition to the property that @jwitt mentioned (javax.security.auth.useSubjectCredsOnly=true), several code changes were made to how HDFS/HBase/Hive components in NiFi acquire a UGI.