Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

You may run into slow Hadoop service start on your OS X development laptop. You can check this by opening up your service logs and looking for large (5-10 second) gaps between successive log entries at startup.

Diagnosis

It often manifests as test failures for MiniDFSCluster-based tests that use short timeouts (<10 seconds).

Here is an example from a NameNode log file with a 5 second stall at startup.

2016-07-25 14:57:37,982 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-07-25 14:57:43,060 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

Another 5 second stall during NameNode startup.

2016-07-25 14:57:48,790 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-07-25 14:57:53,914 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap

Resolution

If you see this behavior you are likely running into an OS X bug. The fix is to put all your entries for localhost on one line as described in this StackOverflow answer.

i.e. Make sure your /etc/hosts file has something like this:

# Replace myhostname with the hostname of your laptop.
#
127.0.0.1       localhost myhostname myhostname.local myhostname.Home

Instead of this:

127.0.0.1       localhost myhostname.local 
127.0.0.1       myhostname myhostname.Home

Root Cause

The root cause of this problem appears to be a long delay when looking up the local host name with InetAddress.getLocalHost. The following code is a minimal repro of this problem on affected systems.

import java.net.*;

class Lookup {
  public static void main(String[] args) throws Exception {
    System.out.println(InetAddress.getLocalHost().getCanonicalHostName());
  }
}

This program can take over 5 seconds to execute on an affected machine.

Verified on OS X 10.10.5 with Oracle JDK 1.8.0_91 and 1.7.0_79.

524 Views