Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can't deploy Cloudera Manager via Director on AWS

avatar
Explorer

I am trying (desperately now after 3 days of trying) to provision a vanila install of Director, Manager and a cluster on AWS. Director is up and running fine, but when i try to create a Manager (and my first Cluster) the Bootstrap fails at the end of configuring the manager and errors after saying it failed to install after 5 attempts. I have exhaustively reviewed the application.log on the Director along with the Server and Agent logs on the Manager. The failure occurs when trying to deploy the agent to the Manager (FROM the Manager).

 

The logs are showing me VERY little as to the cause of this

 

Agent Log Errors:

 

[root@ip-192-168-58-68 cloudera-scm-agent]# tail -f cloudera-scm-agent.log | grep ERROR
[12/Dec/2017 16:15:42 +0000] 14032 MainThread downloader   ERROR    Failed rack peer update: [Errno 111] Connection refused
[12/Dec/2017 16:15:42 +0000] 14032 MainThread downloader   ERROR    Failed rack peer update: [Errno 111] Connection refused
[12/Dec/2017 16:15:49 +0000] 14032 Monitor-HostMonitor throttling_logger ERROR    Timeout with args ['chronyc', 'sources']
[12/Dec/2017 16:15:49 +0000] 14032 Monitor-HostMonitor throttling_logger ERROR    Failed to collect NTP metrics
[12/Dec/2017 16:18:53 +0000] 15347 Monitor-HostMonitor throttling_logger ERROR    Timeout with args ['chronyc', 'sources']
[12/Dec/2017 16:18:53 +0000] 15347 Monitor-HostMonitor throttling_logger ERROR    Failed to collect NTP metrics
[12/Dec/2017 16:19:46 +0000] 15347 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.13.1.jar', 'com.cloudera.cmon.agent.DnsTest']
[12/Dec/2017 16:19:46 +0000] 15347 DnsResolutionMonitor throttling_logger ERROR    Failed to run DnsTest.

Cloudera Director Application Log

 

[2017-12-12 21:10:56.222 +0000] ERROR [port-forwarding-[38267:192.168.58.68:7180]] - - - - - net.schmizz.concurrent.Promise: <<chan#29 / open>> woke to: Opening `direct-tcpip` channel failed: Connection refused
[2017-12-12 21:18:46.531 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.b.c.BootstrapClouderaManagerAgent: Command GlobalHostInstall with ID 22 failed after 5 tries. Details: ApiCommand{id=22, name=GlobalHostInstall, startTime=Tue Dec 12 21:16:25 UTC 2017, endTime=Tue Dec 12 21:18:45 UTC 2017, active=false, success=false, resultMessage=Command completed with 0/1 successful subcommands, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2017-12-12 21:18:46.531 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
[2017-12-12 21:18:46.532 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.DatabasePipelineRunner: Encountered an unrecoverable error: JobError{jobClassName=com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure, jobArguments=[DeploymentContext{environment=Environment{name='cloudera1', provider=InstanceProviderConfig{type='aws'}, credentials=SshCredentials{username='lavastorm', hasPassword=false, hasPrivateKey=true, hasPassphrase=false, port=22, hostKeyFingerprint=Optional.absent(), bastionHost=Optional.absent()}}, deployment=Deployment{name='ClouderaManager', hostname='192.168.58.68', port=7180, username='admin', tlsEnabled=false, tlsConfigurationProperties={}, managerInstance=Optional.of(PluggableComputeInstance{ipAddress=192.168.58.68, delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='192.168.58.68', hostAddress=Optional.of(/192.168.58.68)}, HostEndpoint{hostAddressString='ip-192-168-58-68.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='52.90.79.90', hostAddress=Optional.of(/52.90.79.90)}, HostEndpoint{hostAddressString='ec2-52-90-79-90.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.absent(), javaVersion=Optional.absent(), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=true, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-192-168-58-68.lavastorm.com), clouderaManagerAgentInstalled=false, customScriptPaths={PREPARE_UNMOUNTED_VOLUMES=/var/lib/cloudera-director-plugins/aws-provider-1.4.4/etc/prepare_unmounted_volumes}}), cmHostId=Optional.absent(), cmHostUrl=Optional.absent(), hostKeyFingerprints=[], validationConditions=[], state=InstanceState{status=RUNNING, lastReported=2017-12-12T20:54:55.130Z, lastChecked=2017-12-12T20:54:55.130Z}}), createdExternalDatabases=[], repository='Optional.absent()', repositoryKeyUrl='Optional.absent()', enableEnterpriseTrial=Optional.of(false), unlimitedJce=Optional.absent(), krbAdminUsername='Optional.absent()', javaInstallationStrategy='AUTO', tunnelingRequired=false, cmVersion=Optional.absent()}}, PluggableComputeInstance{ipAddress=192.168.58.68, delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='192.168.58.68', hostAddress=Optional.of(/192.168.58.68)}, HostEndpoint{hostAddressString='ip-192-168-58-68.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='52.90.79.90', hostAddress=Optional.of(/52.90.79.90)}, HostEndpoint{hostAddressString='ec2-52-90-79-90.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.absent(), javaVersion=Optional.absent(), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=true, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-192-168-58-68.lavastorm.com), clouderaManagerAgentInstalled=false, customScriptPaths={PREPARE_UNMOUNTED_VOLUMES=/var/lib/cloudera-director-plugins/aws-provider-1.4.4/etc/prepare_unmounted_volumes}}), cmHostId=Optional.of(004e3678-a159-4358-bbab-1a389ed09e2a), cmHostUrl=Optional.absent(), hostKeyFingerprints=[], validationConditions=[], state=InstanceState{status=RUNNING, lastReported=2017-12-12T20:54:55.130Z, lastChecked=2017-12-12T20:54:55.130Z}}, Optional.of(true), true, 5, 22], jobContext=JobContext{callCountAtThisStackLevel=0, pipelineHandle='b393c4ec-2791-464f-a1bb-bd15744ec532', callStack=CallStack{items=[Item{className='com.cloudera.launchpad.api.jobs.DefaultBootstrapDeploymentJob', callCount=7}, Item{className='com.cloudera.launchpad.bootstrap.deployment.BootstrapClouderaManager', callCount=11}, Item{className='com.cloudera.launchpad.bootstrap.deployment.BootstrapClouderaManager.InstallManagementServices', callCount=2}, Item{className='com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent', callCount=0}], size=4, parent=Optional.absent()}, stackLevel=4}, errorInfo=ErrorInfo{code=CM_AGENT_INSTALLATION_FAIL, properties={instanceIpAddress=192.168.58.68, retryCount=5}, causes=[]}}
[2017-12-12 21:18:46.532 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.DatabasePipelineRunner: Pipeline 'b393c4ec-2791-464f-a1bb-bd15744ec532' failed
[2017-12-12 21:18:46.538 +0000] INFO  [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.s.PipelineRepositoryService: Pipeline 'b393c4ec-2791-464f-a1bb-bd15744ec532': RUNNING -> ERROR
[2017-12-12 21:18:48.692 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
[2017-12-12 21:18:48.692 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.DatabasePipelineRunner: Encountered an unrecoverable error: JobError{jobClassName=com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady, jobArguments=[Environment{name='cloudera1', provider=InstanceProviderConfig{type='aws'}, credentials=SshCredentials{username='lavastorm', hasPassword=false, hasPrivateKey=true, hasPassphrase=false, port=22, hostKeyFingerprint=Optional.absent(), bastionHost=Optional.absent()}}, {}, DeploymentTemplate{name='ClouderaManager', managerVirtualInstance=Optional.of(VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}), externalDatabaseTemplates={}, externalDatabases={}, configs={}, externalAccounts={}, hostname='Optional.absent()', port=Optional.absent(), username='Optional.of(admin)', tlsEnabled=Optional.absent(), tlsConfigurationProperties={}, repository='Optional.absent()', repositoryKeyUrl='Optional.absent()', enableEnterpriseTrial=Optional.of(false), unlimitedJce=Optional.absent(), krbAdminUsername='Optional.absent()', javaInstallationStrategy='AUTO', licenseIsPresent=false, billingIdIsPresent=false, numberOfPostCreateScripts=0, csds=[]}, ClusterTemplate{name='Cluster1', productVersions={CDH=5}, services=[HDFS, HIVE, HUE, OOZIE, SPARK_ON_YARN, YARN, ZOOKEEPER], servicesConfigs={}, virtualInstanceGroups={masters=VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='b2e0fec6-6218-4399-9561-a3173e8b8371', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HIVE=[HIVEMETASTORE, HIVESERVER2], HDFS=[NAMENODE, SECONDARYNAMENODE, BALANCER], OOZIE=[OOZIE_SERVER], HUE=[HUE_SERVER], ZOOKEEPER=[SERVER], YARN=[RESOURCEMANAGER, JOBHISTORY], SPARK_ON_YARN=[SPARK_YARN_HISTORY_SERVER]}, roleTypesConfigs={}, minCount=1}, workers=VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='cba2dea6-9788-470d-acf5-1b32f53340c2', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HDFS=[DATANODE], YARN=[NODEMANAGER]}, roleTypesConfigs={}, minCount=1}, gateway=VirtualInstanceGroup{name='gateway', virtualInstances=[VirtualInstance{id='9780aea8-589a-448e-857f-a0b0b22dd18d', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HDFS=[GATEWAY], HIVE=[GATEWAY], YARN=[GATEWAY], SPARK_ON_YARN=[GATEWAY]}, roleTypesConfigs={}, minCount=1}}, externalDatabaseTemplates={}, externalDatabases={}, parcelRepositories=[http://archive.cloudera.com/cdh5/parcels/5.13/, http://archive.cloudera.com/kafka/parcels/3.0/], restartClusterOnUpdate=false, redeployClientConfigsOnUpdate=false, numberOfInstancePostCreateScripts=0, numberOfPostCreateScripts=0, numberOfPreTerminateScripts=0, migrations=0}], jobContext=JobContext{callCountAtThisStackLevel=0, pipelineHandle='3d0e4664-41ab-47b5-9119-e84bb5055363', callStack=CallStack{items=[Item{className='com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob', callCount=8}], size=1, parent=Optional.absent()}, stackLevel=1}, errorInfo=ErrorInfo{code=CLUSTER_DEPLOYMENT_IN_WRONG_STAGE, properties={currentStage=BOOTSTRAP_FAILED, deploymentName=ClouderaManager, environmentName=cloudera1}, causes=[]}}
[2017-12-12 21:18:48.693 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.DatabasePipelineRunner: Pipeline '3d0e4664-41ab-47b5-9119-e84bb5055363' failed
[2017-12-12 21:18:48.698 +0000] INFO  [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.s.PipelineRepositoryService: Pipeline '3d0e4664-41ab-47b5-9119-e84bb5055363': RUNNING -> ERROR

It seems the Manager Installs, and i can even see the Director log attempting to connect to the Agent and failing (whilst the Agent is installing) then connecting on Port 7180. I really have no idea why thus us failing and have tried EVERY solution i have found including testing hostname and hostname -f match. 

 

This is in AWS running Director 2.6.1. Deploying into an existing VPC and Subnet. No SELinux/IPTables on the Manager Box. The issues seem to be between the Agent and the Manager on the same server.

 

Any advice would be greatly appreciated.

 

Cheers

 

Andy

1 ACCEPTED SOLUTION

avatar
Explorer

This issue was down to a missing Reverse DNS Lookup Zone for the subnet deploying the Cloudera environment to. Once the Reverse Lookup Zone was created and the correct entries were added, everything succeeded. 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Andy,

 

The best place to look is the agent install logs and the agent logs.

/tmp/scm_prepare_node.<Unique ID>

/var/log/cloudera-scm-agent

 

You should also check that your security group allows full access from other cluster instances (e.g., from other instances in the same security group).

 

It also looks like you are using custom DNS, but I still see the .ec2.internal addresses in the HostEndpoint list. If you've set up your DHCP Option Set to point to your own DNS server then you should disable DNS Hostnames and/or DNS Resolution on your VPC.

 

David

avatar
Explorer

This issue was down to a missing Reverse DNS Lookup Zone for the subnet deploying the Cloudera environment to. Once the Reverse Lookup Zone was created and the correct entries were added, everything succeeded.