Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Always get NullPointerException when attempting to launch cluster

Highlighted

Always get NullPointerException when attempting to launch cluster

[2014-11-21 20:04:48] INFO  [pipeline-thread-1] - c.c.l.d.DeploymentRepositoryService: Deployment 'manager': BOOTSTRAPPING -> BOOTSTRAPPING
[2014-11-21 20:04:48] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: >> BootstrapDeployment/1 [CreateDeploymentContext{environment=Environment{name='env', provider=InstanceProviderConfig{type='a ...
[2014-11-21 20:04:49] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=15, pipeline=b371b10d-da64-4586-a720-353e17681188, c ...
[2014-11-21 20:04:49] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Requesting an instance for Cloudera Manager]
[2014-11-21 20:04:49] INFO  [pipeline-thread-1] - com.cloudera.launchpad.pipeline.Job: Requesting an instance for Cloudera Manager
[2014-11-21 20:04:49] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: << None{}
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: >> AllocateInstances/2 [VirtualInstanceGroup{name='CM', virtualInstances=[VirtualInstance{id='935935c9-fd56-4076-a5c7-c3e27 ...
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=20, pipeline=b371b10d-da64-4586-a720-353e17681188, c ...
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: >> AllocateAndWaitForInstancesToRun/2 [VirtualInstanceGroup{name='CM', virtualInstances=[VirtualInstance{id='935935c9-fd56-4076-a5c7-c3e27 ...
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.c.l.bootstrap.AllocateInstances: Allocating 1 instances (min count 1) in group CM
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.cloudera.launchpad.ec2.EC2Provider: >> Requesting 1 instances for InstanceTemplate{name='template', type='c3.2xlarge', image='ami-52a3153a', bootstrapScriptIsPresent=true, config={rootVolumeSizeGB=60, securityGroupsIds=sg-9e8b94fb, instanceNamePrefix=cd, subnetId=subnet-87c074f0}, tags={cd=true}}
[2014-11-21 20:04:50] INFO  [pipeline-thread-1] - c.cloudera.launchpad.ec2.EC2Provider: >> Network interface specification: {DeviceIndex: 0,SubnetId: subnet-87c074f0,Groups: [sg-9e8b94fb],DeleteOnTermination: true,PrivateIpAddresses: [],AssociatePublicIpAddress: true}
[2014-11-21 20:04:50] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Attempt to execute job failed
java.lang.NullPointerException: null
        at com.cloudera.launchpad.ec2.EC2Provider.newRunInstancesRequest(EC2Provider.java:551) ~[launchpad-aws-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.ec2.EC2Provider.allocateInstancesForTemplate(EC2Provider.java:427) ~[launchpad-aws-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.ec2.EC2Provider.allocate(EC2Provider.java:244) ~[launchpad-aws-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.ec2.EC2Provider.allocate(EC2Provider.java:206) ~[launchpad-aws-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.bootstrap.AllocateInstances$AllocateAndWaitForInstancesToRun.run(AllocateInstances.java:120) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.bootstrap.AllocateInstances$AllocateAndWaitForInstancesToRun.run(AllocateInstances.java:99) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.pipeline.job.Job2.runUnchecked(Job2.java:31) ~[launchpad-pipeline-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner$1.call(DatabasePipelineRunner.java:229) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1]
        at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na]
        at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.attemptMultipleJobExecutionsWithRetries(DatabasePipelineRunner.java:213) [launchpad-pipeline-database-1.0.1.jar!/:1.0.1]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:132) [launchpad-pipeline-database-1.0.1.jar!/:1.0.1]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

 

In the UI:

 

Status: 500, Reason: /api/v1/environments/env/deployments/manager/clusters: Internal Server Error, Deployment manager is in stage BOOTSTRAP_FAILED. See deployment status and server logs for details.

 

I'm using ami-52a3153a (Ubuntu Trusty) in us-east. Any thoughts?

8 REPLIES 8

Re: Always get NullPointerException when attempting to launch cluster

Expert Contributor

Hello!

 

The problem appears to be that the AMI you selected is using an instance store for its root device. Director expects the root device to be hosted on EBS. (Nevertheless, Director should respond more nicely here, so we'll work on that.)

 

You should try a different AMI that uses EBS for its root device. Also, at this time, Director only supports AMIs for RHEL 6.4 and CentOS 6.4 / 6.5 for cluster nodes, so you might encounter other issues if you stick with Ubuntu.

Re: Always get NullPointerException when attempting to launch cluster

Thanks, I switched to using CentOS 6.4 (I have to launch the one from the marketplace, then make a private AMI out of it so I could re-use that direct launch from Director).

 

So I managed to get the manager to start (green and Ready in the UI) after hacking the hostname into the hosts file with this bootstrap,

 

HOSTNAME=$(hostname)
cat<<EOF > /etc/hosts
127.0.0.1               localhost.localdomain localhost ${HOSTNAME} 
::1             localhost6.localdomain6 localhost6
EOF

 I then tried to launch a small cluster, and after an hour it's stuck here:

 

Waiting for Cloudera Manager to deploy agent on 10.0.4.206
Waiting for Cloudera Manager to deploy agent on 10.0.0.162
Waiting for Cloudera Manager to deploy agent on 10.0.0.163
Waiting for Cloudera Manager to deploy agent on 10.0.6.104

 From the logs:

[2014-11-24 20:56:45] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:56:45] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:56:46] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:56:46] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:00] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:01] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:01] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:01] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:15] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:16] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:16] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:16] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:30] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:31] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:31] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:31] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:45] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:46] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:46] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:57:46] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:00] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:01] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:01] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:01] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:15] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:16] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:16] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:16] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:30] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:31] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:31] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:31] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:45] INFO  [pipeline-thread-6] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=28, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:46] INFO  [pipeline-thread-11] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=31, name=GlobalHostInstall, startTime=Mon Nov 24 20:42:59 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:46] INFO  [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=36, name=GlobalHostInstall, startTime=Mon Nov 24 20:47:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2014-11-24 20:58:46] INFO  [pipeline-thread-13] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=39, name=GlobalHostInstall, startTime=Mon Nov 24 20:49:01 UTC 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null}

 Any thoughts? If I ssh to one of the instances the process tree looks like:

    1 ?        Ss     0:00 /sbin/init
  376 ?        S<s    0:00 /sbin/udevd -d
  881 ?        Ss     0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid eth0
  925 ?        S<sl   0:00 auditd
  941 ?        Sl     0:00 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
 1056 ?        Ss     0:00 /usr/libexec/postfix/master
 1076 ?        S      0:00  \_ pickup -l -t fifo -u
 1077 ?        S      0:00  \_ qmgr -l -t fifo -u
 1064 ?        Ss     0:00 crond
 1080 hvc0     Ss+    0:00 /sbin/agetty /dev/hvc0 38400 vt100-nav
 1083 tty1     Ss+    0:00 /sbin/mingetty /dev/tty1
 1257 ?        Ss     0:00 /usr/sbin/anacron -s
 6533 ?        Ss     0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
 7280 ?        Ss     0:00 /usr/sbin/sshd
 7755 ?        Ss     0:00  \_ sshd: root@pts/0 
 8059 pts/0    Ss+    0:00  |   \_ bash -c /tmp/scm_prepare_node.VXZnuLTH/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.VXZnuLTH/packages.scm --always /tmp/scm_prepare_node.VXZnuLTH/always_install.scm --x86_64 /tmp/scm_prepare_node.VXZnuLTH/x86_64_packages.scm --skipImpa
 8066 pts/0    S+     0:00  |       \_ /bin/bash /tmp/scm_prepare_node.VXZnuLTH/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.VXZnuLTH/packages.scm --always /tmp/scm_prepare_node.VXZnuLTH/always_install.scm --x86_64 /tmp/scm_prepare_node.VXZnuLTH/x86_64_packages.scm --sk
 8074 pts/0    S+     0:00  |       |   \_ /bin/bash /tmp/scm_prepare_node.VXZnuLTH/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.VXZnuLTH/packages.scm --always /tmp/scm_prepare_node.VXZnuLTH/always_install.scm --x86_64 /tmp/scm_prepare_node.VXZnuLTH/x86_64_packages.scm
 8076 pts/0    S+     0:00  |       |   |   \_ flock 4
 8075 pts/0    S+     0:00  |       |   \_ tee /dev/fd/63
 8077 pts/0    S+     0:00  |       |       \_ /bin/bash /tmp/scm_prepare_node.VXZnuLTH/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.VXZnuLTH/packages.scm --always /tmp/scm_prepare_node.VXZnuLTH/always_install.scm --x86_64 /tmp/scm_prepare_node.VXZnuLTH/x86_64_packages.
 8078 pts/0    S+     0:00  |       |           \_ cat
 8067 pts/0    S+     0:00  |       \_ tee /tmp/scm_prepare_node.VXZnuLTH/scm_prepare_node.log
 8118 ?        Ss     0:00  \_ sshd: root@pts/1 
 8121 pts/1    Ss     0:00      \_ -bash
 8434 pts/1    R+     0:00          \_ ps afx
 7620 ?        S      0:00 su -s /bin/bash -c nohup /usr/sbin/cmf-agent 
 7622 ?        Ssl    0:03  \_ /usr/lib64/cmf/agent/build/env/bin/python /usr/lib64/cmf/agent/src/cmf/agent.py --package_dir /usr/lib64/cmf/service --agent_dir /var/run/cloudera-scm-agent --lib_dir /var/lib/cloudera-scm-agent --logfile /var/log/cloudera-scm-agent/cloudera-scm-agent.log
 7655 ?        Ss     0:00 /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/python /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
 7656 ?        S      0:00  \_ /usr/lib64/cmf/agent/build/env/bin/python /usr/lib64/cmf/agent/src/cmf/supervisor_listener.py -l /var/log/cloudera-scm-agent/cmf_listener.log /var/run/cloudera-scm-agent/events

 It seems all the ssh+bash have hung. If I start the agent manually it comes up, but I guess Director is waiting on a success status code from the ssh...

Re: Always get NullPointerException when attempting to launch cluster

Thanks, I used CentOS 6.4 from the marketplace to create an AMI I could directly deploy from (I did nothing to the machine, just started it, stopped it, and make it into an AMI).

 

I had to put the hostname into hosts in bootstrap for the instances to semi-work:

 

HOSTNAME=$(hostname)
cat<<EOF > /etc/hosts
127.0.0.1               localhost.localdomain localhost ${HOSTNAME} 
::1             localhost6.localdomain6 localhost6
EOF

 But now when I try to launch a cluster it hangs waiting on the agents to start (on all boxes except the manager, which I assume doesn't have an agent?). If I ssh to one of them I can see the prepare_node script seems to have stalled:

 

 2129 ?        Ss     0:00 /usr/sbin/sshd
 2602 ?        Ss     0:00  \_ sshd: root@pts/0 
 2906 pts/0    Ss+    0:00  |   \_ bash -c /tmp/scm_prepare_node.SYYhT0vx/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.SYYhT0vx/packages.scm --always /tmp/scm_prepare_node.SYYhT0vx/always_install.scm --x86_64 /tmp/scm_prepare_node.SYYhT0vx/x86_64_packages.scm --skipImpa
 2913 pts/0    S+     0:00  |       \_ /bin/bash /tmp/scm_prepare_node.SYYhT0vx/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.SYYhT0vx/packages.scm --always /tmp/scm_prepare_node.SYYhT0vx/always_install.scm --x86_64 /tmp/scm_prepare_node.SYYhT0vx/x86_64_packages.scm --sk
 2921 pts/0    S+     0:00  |       |   \_ /bin/bash /tmp/scm_prepare_node.SYYhT0vx/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.SYYhT0vx/packages.scm --always /tmp/scm_prepare_node.SYYhT0vx/always_install.scm --x86_64 /tmp/scm_prepare_node.SYYhT0vx/x86_64_packages.scm
 2923 pts/0    S+     0:00  |       |   |   \_ flock 4
 2922 pts/0    S+     0:00  |       |   \_ tee /dev/fd/63
 2924 pts/0    S+     0:00  |       |       \_ /bin/bash /tmp/scm_prepare_node.SYYhT0vx/scm_prepare_node.sh --server_version 5.2.0 --server_build 60 --packages /tmp/scm_prepare_node.SYYhT0vx/packages.scm --always /tmp/scm_prepare_node.SYYhT0vx/always_install.scm --x86_64 /tmp/scm_prepare_node.SYYhT0vx/x86_64_packages.
 2925 pts/0    S+     0:00  |       |           \_ cat
 2914 pts/0    S+     0:00  |       \_ tee /tmp/scm_prepare_node.SYYhT0vx/scm_prepare_node.log
 3511 ?        Ss     0:00  \_ sshd: root@pts/1 
 3514 pts/1    Ss     0:00      \_ -bash
 3531 pts/1    R+     0:00          \_ ps afx

 For what it's worth I can manually start the agent and it works. But it seems Director is waiting on a success status code from the prepare_node script.

 

These are the logs from cloudera agent on one of those nodes:

 

[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     SCM Agent Version: 5.2.0
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Agent Protocol Version: 4
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Using Host ID: f1d0a303-4455-4f72-be22-78e314f665ab
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Using directory: /var/run/cloudera-scm-agent
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Using supervisor binary path: /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certificates in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     No command line vars
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database type)
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Agent starting as pid 2471 user root(0) group root(0).
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        WARNING  Expected mode 0751 for /var/run/cloudera-scm-agent but was 0755
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Re-using pre-existing directory: /var/run/cloudera-scm-agent
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Created /var/run/cloudera-scm-agent/cgroups
[25/Nov/2014 19:30:48 +0000] 2471 MainThread agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/cgroups to 0751
[25/Nov/2014 19:30:48 +0000] 2471 MainThread cgroups      INFO     Found cgroups subsystem: cpu
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Found cgroups subsystem: cpuacct
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Found cgroups subsystem: memory
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Found cgroups subsystem: blkio
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Created /var/run/cloudera-scm-agent/cgroups/memory
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Created /var/run/cloudera-scm-agent/cgroups/cpu
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Created /var/run/cloudera-scm-agent/cgroups/cpuacct
[25/Nov/2014 19:30:49 +0000] 2471 MainThread cgroups      INFO     Created /var/run/cloudera-scm-agent/cgroups/blkio
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Found cgroups capabilities: {'has_memory': True, 'default_memory_limit_in_bytes': -1, 'default_memory_soft_limit_in_bytes': -1, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': 950000, 'has_cpu': True, 'default_blkio_weight': 1000, 'default_cpu_shares': 1024, 'has_cpuacct': True, 'has_blkio': True}
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Setting up supervisord event monitor.
[25/Nov/2014 19:30:49 +0000] 2471 MainThread filesystem_map INFO     Monitored nodev filesystem types: ['nfs', 'nfs4', 'tmpfs']
[25/Nov/2014 19:30:49 +0000] 2471 MainThread filesystem_map INFO     Using timeout of 2.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread filesystem_map INFO     Using join timeout of 0.100000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread filesystem_map INFO     Using tolerance of 60.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread filesystem_map INFO     Local filesystem types whitelist: ['ext2', 'ext3', 'ext4']
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Using metrics_url_timeout_seconds of 30.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Using task_metrics_timeout_seconds of 5.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Using max_collection_wait_seconds of 10.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread metrics      INFO     Importing tasktracker metric schema from file /usr/lib64/cmf/agent/src/cmf/monitor/tasktracker/schema.json
[25/Nov/2014 19:30:49 +0000] 2471 MainThread dns_names    INFO     Using timeout of 2.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread ntp_monitor  INFO     Using timeout of 2.000000
[25/Nov/2014 19:30:49 +0000] 2471 MainThread stacks_collection_manager INFO     Using max_uncompressed_file_size_bytes: 5242880
[25/Nov/2014 19:30:49 +0000] 2471 MainThread __init__     INFO     Importing metric schema from file /usr/lib64/cmf/agent/src/cmf/monitor/schema.json
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib64/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Created /var/run/cloudera-scm-agent/process
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/process to 0751
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Created /var/run/cloudera-scm-agent/supervisor
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/supervisor to 0751
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Created /var/run/cloudera-scm-agent/supervisor/include
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/supervisor/include to 0751
[25/Nov/2014 19:30:49 +0000] 2471 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1295, in find_or_start_supervisor
    self.configure_supervisor_clients()
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1534, in configure_supervisor_clients
    supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1563, in realize
    Options.realize(self, *arg, **kw)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 310, in realize
    self.process_config()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 318, in process_config
    self.process_config_file(do_usage)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 353, in process_config_file
    self.usage(str(msg))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 141, in usage
    self.exit(2)
SystemExit: 2
[25/Nov/2014 19:30:49 +0000] 2471 MainThread tmpfs        INFO     Successfully mounted tmpfs at /var/run/cloudera-scm-agent/process
[25/Nov/2014 19:30:50 +0000] 2471 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 1)
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Supervisor version: 3.0
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Successfully connected to supervisor
[25/Nov/2014 19:30:51 +0000] 2471 MainThread status_server INFO     Using maximum impala profile bundle size of 1073741824 bytes.
[25/Nov/2014 19:30:51 +0000] 2471 MainThread status_server INFO     Using maximum stacks log bundle size of 1073741824 bytes.
[25/Nov/2014 19:30:51 +0000] 2471 MainThread _cplogging   INFO     [25/Nov/2014:19:30:51] ENGINE Bus STARTING
[25/Nov/2014 19:30:51 +0000] 2471 MainThread _cplogging   INFO     [25/Nov/2014:19:30:51] ENGINE Started monitor thread '_TimeoutMonitor'.
[25/Nov/2014 19:30:51 +0000] 2471 MainThread _cplogging   INFO     [25/Nov/2014:19:30:51] ENGINE Serving on localhost.localdomain:9000
[25/Nov/2014 19:30:51 +0000] 2471 MainThread _cplogging   INFO     [25/Nov/2014:19:30:51] ENGINE Bus STARTED
[25/Nov/2014 19:30:51 +0000] 2471 MainThread __init__     INFO     New monitor: (<cmf.monitor.host.HostMonitor object at 0x1fe7910>,)
[25/Nov/2014 19:30:51 +0000] 2471 MonitorDaemon-Scheduler __init__     INFO     Monitor ready to report: ('HostMonitor',)
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Setting default socket timeout to 30
[25/Nov/2014 19:30:51 +0000] 2471 Monitor-HostMonitor network_interfaces INFO     NIC iface eth0 doesn't support ETHTOOL (95)
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Using parcels directory from server provided value: /opt/cloudera/parcels
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Created /opt/cloudera/parcels
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Chowning /opt/cloudera/parcels to root (0) root (0)
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Chmod'ing /opt/cloudera/parcels to 0755
[25/Nov/2014 19:30:51 +0000] 2471 MainThread parcel       INFO     Agent does create users/groups and apply file permissions
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Created /opt/cloudera/parcel-cache
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Chowning /opt/cloudera/parcel-cache to root (0) root (0)
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Chmod'ing /opt/cloudera/parcel-cache to 0755
[25/Nov/2014 19:30:51 +0000] 2471 MainThread downloader   INFO     Downloader path: /opt/cloudera/parcel-cache
[25/Nov/2014 19:30:51 +0000] 2471 MainThread parcel_cache INFO     Using /opt/cloudera/parcel-cache for parcel cache
[25/Nov/2014 19:30:51 +0000] 2471 MainThread firehoses    INFO     Reporting interval updated: 5.0 -> 60
[25/Nov/2014 19:30:51 +0000] 2471 MainThread agent        INFO     Active parcel list updated; recalculating component info.
[25/Nov/2014 19:31:51 +0000] 2471 Monitor-HostMonitor throttling_logger INFO     Using java location: '/usr/java/jdk1.7.0_67-cloudera/bin/java'.
[25/Nov/2014 19:31:51 +0000] 2471 Monitor-HostMonitor throttling_logger WARNING  hostname ip-10-0-11-42 differs from the canonical name localhost.localdomain
[25/Nov/2014 19:31:51 +0000] 2471 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the SERVICEMONITOR.
[25/Nov/2014 19:31:51 +0000] 2471 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the HOSTMONITOR.
[25/Nov/2014 19:31:51 +0000] 2471 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-aa327921a42a1f9abfc0d4a872245845
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 71, in _send
    self._port)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 464, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused
[25/Nov/2014 19:42:51 +0000] 2471 MonitorDaemon-Reporter throttling_logger ERROR    (10 skipped) Error sending messages to firehose: MGMT-HOSTMONITOR-aa327921a42a1f9abfc0d4a872245845
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 71, in _send
    self._port)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 464, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused
[25/Nov/2014 19:52:51 +0000] 2471 MonitorDaemon-Reporter throttling_logger ERROR    (9 skipped) Error sending messages to firehose: MGMT-HOSTMONITOR-aa327921a42a1f9abfc0d4a872245845
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 71, in _send
    self._port)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 464, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused

 

Any hints?

Re: Always get NullPointerException when attempting to launch cluster

bhavanki might have a more comprehensive answer but just as a quick check, is there a reason why you are are generating the host name as in the snippet above? Are you not able to use the VPC's name resolution by default?
Regards,
Gautam Gopalakrishnan

Re: Always get NullPointerException when attempting to launch cluster

If I don't tell the machine that its auto-generated hostname (e.g. "ip-1-2-3-4") is localhost, then the Cloudera processes fail to start with errors that they can't resolve themselves by name.

Re: Always get NullPointerException when attempting to launch cluster

Thanks for that. Like I mentioned earlier, are you not enabling name resolution in the VPC settings? That should help with resolving ip-1-2-3-4 to an IP address.
Regards,
Gautam Gopalakrishnan

Re: Always get NullPointerException when attempting to launch cluster

Expert Contributor

GautamG is on the right track. It's important for Director that name resolution and routing be fully set up within the VPC so that all the pieces of the system can reach each other reliably. You should not need to fiddle with /etc/hosts.

 

Like Gautam suggested, please take a look at your VPC and ensure that DNS support is enabled. Instructions for checking are here:

 

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html

 

It is also worth checking your security groups to ensure that traffic between nodes in your cluster is not blocked. For example, you could run all of the instances within the same security group, and let the security group allow all traffic between nodes in the group. If the CM server and the other instances are in different security groups, they will need to be configured to allow traffic between them. Director, of course, needs access as well.

 

Usually when Director is stalled waiting for hear from agents, it's because all those network connections aren't fully in place. I believe that in the phase where you're stalled, the agent on the CM server itself is already running; Director is trying to hear from the nodes out on the cluster.

 

Please let us know how your configurations look so we make sure you get fully up and running.

 

Re: Always get NullPointerException when attempting to launch cluster

Expert Contributor

Hello!

 

Just checking back to see if you need any further help with the networking issues in your cluster. Please let us know.

 

Bill

Don't have an account?
Coming from Hortonworks? Activate your account here