Support Questions

Find answers, ask questions, and share your expertise

HDF 2.0 Hanging on Restart

avatar
Master Guru

I have NiFi installed on Centos 7 and it was working the other day.

Today I tried it and could not access anything. The logs showed nothing, so I restarted it.

Now it never starts but no errors.

2016-09-26 16:55:34,366 INFO [main] /nifi-api Initializing Spring root WebApplicationContext
2016-09-26 16:55:36,773 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/./conf/nifi.properties'
2016-09-26 16:55:36,778 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/./conf/nifi.properties'
2016-09-26 16:55:36,779 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 117 properties from /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/./conf/nifi.properties

java -version openjdk version "1.8.0_101" OpenJDK Runtime Environment (build 1.8.0_101-b13) OpenJDK 64-Bit Server VM (build 25.101-b13, mixed mode)

nothing else is running on the machine.

1 ACCEPTED SOLUTION

avatar

I have traced the root cause to be the low entropy on new VM instances, especially if they are headless (a typical cloud server today). To test if one is affected by the problem:

head -1 /dev/urandom

If the above command doesn't return immediately with some garbage output, but rather hangs, your server is affected by the problem.

Java's SecureRandom initializes by reading from /dev/urandom.

Some solution online suggest modifying JCE settings to use /dev/random, but this is less desirable:

  1. It's not guaranteed to always work
  2. There can be multiple JVMs, and admin may not always know which install is used by a specific process, and JAVA_HOME might not even be set, leaving him/her guessing
  3. It would require manual intervention, which hinders e.g. blueprints functionality (fully automated Ambari install)

One solution which worked great for me and didn't require any JDK or code changes was to install the Haveged entropy daemon, which was designed for this problem specifically: http://www.issihosts.com/haveged/

Here's a process for CentOS 7. Similar steps are available for Ubuntu, etc.

Haveged packages are in the EPEL repo. Need to install version-specific one for CentOS

rpm -Uvh http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
yum install -y haveged
chkconfig haveged on
service haveged start 

After this my secure NiFI cluster reliably restarts within expected time windows.

View solution in original post

6 REPLIES 6

avatar
Master Mentor
@Timothy Spann

Do you see anything in the nifi-bootstrap.log?

avatar
Master Guru

it started up normally after a long delay.

avatar

I have traced the root cause to be the low entropy on new VM instances, especially if they are headless (a typical cloud server today). To test if one is affected by the problem:

head -1 /dev/urandom

If the above command doesn't return immediately with some garbage output, but rather hangs, your server is affected by the problem.

Java's SecureRandom initializes by reading from /dev/urandom.

Some solution online suggest modifying JCE settings to use /dev/random, but this is less desirable:

  1. It's not guaranteed to always work
  2. There can be multiple JVMs, and admin may not always know which install is used by a specific process, and JAVA_HOME might not even be set, leaving him/her guessing
  3. It would require manual intervention, which hinders e.g. blueprints functionality (fully automated Ambari install)

One solution which worked great for me and didn't require any JDK or code changes was to install the Haveged entropy daemon, which was designed for this problem specifically: http://www.issihosts.com/haveged/

Here's a process for CentOS 7. Similar steps are available for Ubuntu, etc.

Haveged packages are in the EPEL repo. Need to install version-specific one for CentOS

rpm -Uvh http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
yum install -y haveged
chkconfig haveged on
service haveged start 

After this my secure NiFI cluster reliably restarts within expected time windows.

avatar

I would not recommend using haveged without fully understanding the issue of getting sufficiently unpredictable random input for security purposes. Multiple well-credentialed security experts have weighed in with concerned, if not dismissive, responses.

Michael Kerrisk:

Having read a number of papers about HAVEGE, Peter [Anvin] said he had been unable to work out whether this was a "real thing". Most of the papers that he has read run along the lines, "we took the output from HAVEGE, and ran some tests on it and all of the tests passed". The problem with this sort of reasoning is the point that Peter made earlier: there are no tests for randomness, only for non-randomness.

One of Peter's colleagues replaced the random input source employed by HAVEGE with a constant stream of ones. All of the same tests passed. In other words, all that the test results are guaranteeing is that the HAVEGE developers have built a very good PRNG. It is possible that HAVEGE does generate some amount of randomness, Peter said. But the problem is that the proposed source of randomness is simply too complex to analyze; thus it is not possible to make a definitive statement about whether it is truly producing randomness. (By contrast, the HWRNGs that Peter described earlier have been analyzed to produce a quantum theoretical justification that they are producing true randomness.) "So, while I can't really recommend it, I can't not recommend it either." If you are going to run HAVEGE, Peter strongly recommended running it together with rngd, rather than as a replacement for it.

Tom Leek:

Of course, the whole premise of HAVEGE is questionable. For any practical security, you need a few "real random" bits, no more than 200, which you use as seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of pseudo-[data] indistinguishable from true randomness, and that's good enough for all practical purposes.

Insisting on going back to the hardware for every bit looks like yet another outbreak of that flawed idea which sees entropy as a kind of gasoline, which you burn up when you look at it.

I would recommend directing the JVM to read from /dev/urandom. In response to the concerns above, I'm not sure what "It's not guaranteed to always work" means, but the other issues are mitigated by providing a Java parameter in conf/bootstrap.conf.

avatar

Thanks Andy. I clearly understand the concern around security confidence levels, and don't put it out as a solution. Rather a workaround to let the devs move forward. This isn't an official solution by any means, and everyone should understand that in a thread.

avatar
Master Guru

that worked for me right away