Member since
12-12-2017
10
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3562 | 01-02-2018 02:40 AM |
01-02-2018
02:40 AM
This issue was down to a missing Reverse DNS Lookup Zone for the subnet deploying the Cloudera environment to. Once the Reverse Lookup Zone was created and the correct entries were added, everything succeeded.
... View more
12-12-2017
01:34 PM
I am trying (desperately now after 3 days of trying) to provision a vanila install of Director, Manager and a cluster on AWS. Director is up and running fine, but when i try to create a Manager (and my first Cluster) the Bootstrap fails at the end of configuring the manager and errors after saying it failed to install after 5 attempts. I have exhaustively reviewed the application.log on the Director along with the Server and Agent logs on the Manager. The failure occurs when trying to deploy the agent to the Manager (FROM the Manager). The logs are showing me VERY little as to the cause of this Agent Log Errors: [root@ip-192-168-58-68 cloudera-scm-agent]# tail -f cloudera-scm-agent.log | grep ERROR
[12/Dec/2017 16:15:42 +0000] 14032 MainThread downloader ERROR Failed rack peer update: [Errno 111] Connection refused
[12/Dec/2017 16:15:42 +0000] 14032 MainThread downloader ERROR Failed rack peer update: [Errno 111] Connection refused
[12/Dec/2017 16:15:49 +0000] 14032 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['chronyc', 'sources']
[12/Dec/2017 16:15:49 +0000] 14032 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics
[12/Dec/2017 16:18:53 +0000] 15347 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['chronyc', 'sources']
[12/Dec/2017 16:18:53 +0000] 15347 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics
[12/Dec/2017 16:19:46 +0000] 15347 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.13.1.jar', 'com.cloudera.cmon.agent.DnsTest']
[12/Dec/2017 16:19:46 +0000] 15347 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest. Cloudera Director Application Log [2017-12-12 21:10:56.222 +0000] ERROR [port-forwarding-[38267:192.168.58.68:7180]] - - - - - net.schmizz.concurrent.Promise: <<chan#29 / open>> woke to: Opening `direct-tcpip` channel failed: Connection refused
[2017-12-12 21:18:46.531 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.b.c.BootstrapClouderaManagerAgent: Command GlobalHostInstall with ID 22 failed after 5 tries. Details: ApiCommand{id=22, name=GlobalHostInstall, startTime=Tue Dec 12 21:16:25 UTC 2017, endTime=Tue Dec 12 21:18:45 UTC 2017, active=false, success=false, resultMessage=Command completed with 0/1 successful subcommands, serviceRef=null, roleRef=null, hostRef=null, parent=null}
[2017-12-12 21:18:46.531 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
[2017-12-12 21:18:46.532 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.DatabasePipelineRunner: Encountered an unrecoverable error: JobError{jobClassName=com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure, jobArguments=[DeploymentContext{environment=Environment{name='cloudera1', provider=InstanceProviderConfig{type='aws'}, credentials=SshCredentials{username='lavastorm', hasPassword=false, hasPrivateKey=true, hasPassphrase=false, port=22, hostKeyFingerprint=Optional.absent(), bastionHost=Optional.absent()}}, deployment=Deployment{name='ClouderaManager', hostname='192.168.58.68', port=7180, username='admin', tlsEnabled=false, tlsConfigurationProperties={}, managerInstance=Optional.of(PluggableComputeInstance{ipAddress=192.168.58.68, delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='192.168.58.68', hostAddress=Optional.of(/192.168.58.68)}, HostEndpoint{hostAddressString='ip-192-168-58-68.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='52.90.79.90', hostAddress=Optional.of(/52.90.79.90)}, HostEndpoint{hostAddressString='ec2-52-90-79-90.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.absent(), javaVersion=Optional.absent(), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=true, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-192-168-58-68.lavastorm.com), clouderaManagerAgentInstalled=false, customScriptPaths={PREPARE_UNMOUNTED_VOLUMES=/var/lib/cloudera-director-plugins/aws-provider-1.4.4/etc/prepare_unmounted_volumes}}), cmHostId=Optional.absent(), cmHostUrl=Optional.absent(), hostKeyFingerprints=[], validationConditions=[], state=InstanceState{status=RUNNING, lastReported=2017-12-12T20:54:55.130Z, lastChecked=2017-12-12T20:54:55.130Z}}), createdExternalDatabases=[], repository='Optional.absent()', repositoryKeyUrl='Optional.absent()', enableEnterpriseTrial=Optional.of(false), unlimitedJce=Optional.absent(), krbAdminUsername='Optional.absent()', javaInstallationStrategy='AUTO', tunnelingRequired=false, cmVersion=Optional.absent()}}, PluggableComputeInstance{ipAddress=192.168.58.68, delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='192.168.58.68', hostAddress=Optional.of(/192.168.58.68)}, HostEndpoint{hostAddressString='ip-192-168-58-68.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='52.90.79.90', hostAddress=Optional.of(/52.90.79.90)}, HostEndpoint{hostAddressString='ec2-52-90-79-90.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.absent(), javaVersion=Optional.absent(), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=true, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-192-168-58-68.lavastorm.com), clouderaManagerAgentInstalled=false, customScriptPaths={PREPARE_UNMOUNTED_VOLUMES=/var/lib/cloudera-director-plugins/aws-provider-1.4.4/etc/prepare_unmounted_volumes}}), cmHostId=Optional.of(004e3678-a159-4358-bbab-1a389ed09e2a), cmHostUrl=Optional.absent(), hostKeyFingerprints=[], validationConditions=[], state=InstanceState{status=RUNNING, lastReported=2017-12-12T20:54:55.130Z, lastChecked=2017-12-12T20:54:55.130Z}}, Optional.of(true), true, 5, 22], jobContext=JobContext{callCountAtThisStackLevel=0, pipelineHandle='b393c4ec-2791-464f-a1bb-bd15744ec532', callStack=CallStack{items=[Item{className='com.cloudera.launchpad.api.jobs.DefaultBootstrapDeploymentJob', callCount=7}, Item{className='com.cloudera.launchpad.bootstrap.deployment.BootstrapClouderaManager', callCount=11}, Item{className='com.cloudera.launchpad.bootstrap.deployment.BootstrapClouderaManager.InstallManagementServices', callCount=2}, Item{className='com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent', callCount=0}], size=4, parent=Optional.absent()}, stackLevel=4}, errorInfo=ErrorInfo{code=CM_AGENT_INSTALLATION_FAIL, properties={instanceIpAddress=192.168.58.68, retryCount=5}, causes=[]}}
[2017-12-12 21:18:46.532 +0000] ERROR [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.DatabasePipelineRunner: Pipeline 'b393c4ec-2791-464f-a1bb-bd15744ec532' failed
[2017-12-12 21:18:46.538 +0000] INFO [p-bd15744ec532-DefaultBootstrapDeploymentJob] c52f762e-479e-42c8-be1c-1ee4c0e1aa54 POST /api/v10/environments/cloudera1/deployments com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$WaitForSuccessOrRetryOnFailure - c.c.l.p.s.PipelineRepositoryService: Pipeline 'b393c4ec-2791-464f-a1bb-bd15744ec532': RUNNING -> ERROR
[2017-12-12 21:18:48.692 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
[2017-12-12 21:18:48.692 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.DatabasePipelineRunner: Encountered an unrecoverable error: JobError{jobClassName=com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady, jobArguments=[Environment{name='cloudera1', provider=InstanceProviderConfig{type='aws'}, credentials=SshCredentials{username='lavastorm', hasPassword=false, hasPrivateKey=true, hasPassphrase=false, port=22, hostKeyFingerprint=Optional.absent(), bastionHost=Optional.absent()}}, {}, DeploymentTemplate{name='ClouderaManager', managerVirtualInstance=Optional.of(VirtualInstance{id='1dcfc459-bb3e-4812-be95-e0e7eebc2e23', template=InstanceTemplate{name='cloudera-Non-Spot', type='m4.large', image='ami-185a260e', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, blockDurationMinutes=60, rootVolumeType=gp2, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7, spotBidUSDPerHr=0.1229}, tags={}, normalizeInstance=true, sshUsername=Optional.of(ec2-user), sshHostKeyRetrievalType=NONE}}), externalDatabaseTemplates={}, externalDatabases={}, configs={}, externalAccounts={}, hostname='Optional.absent()', port=Optional.absent(), username='Optional.of(admin)', tlsEnabled=Optional.absent(), tlsConfigurationProperties={}, repository='Optional.absent()', repositoryKeyUrl='Optional.absent()', enableEnterpriseTrial=Optional.of(false), unlimitedJce=Optional.absent(), krbAdminUsername='Optional.absent()', javaInstallationStrategy='AUTO', licenseIsPresent=false, billingIdIsPresent=false, numberOfPostCreateScripts=0, csds=[]}, ClusterTemplate{name='Cluster1', productVersions={CDH=5}, services=[HDFS, HIVE, HUE, OOZIE, SPARK_ON_YARN, YARN, ZOOKEEPER], servicesConfigs={}, virtualInstanceGroups={masters=VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='b2e0fec6-6218-4399-9561-a3173e8b8371', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HIVE=[HIVEMETASTORE, HIVESERVER2], HDFS=[NAMENODE, SECONDARYNAMENODE, BALANCER], OOZIE=[OOZIE_SERVER], HUE=[HUE_SERVER], ZOOKEEPER=[SERVER], YARN=[RESOURCEMANAGER, JOBHISTORY], SPARK_ON_YARN=[SPARK_YARN_HISTORY_SERVER]}, roleTypesConfigs={}, minCount=1}, workers=VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='cba2dea6-9788-470d-acf5-1b32f53340c2', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HDFS=[DATANODE], YARN=[NODEMANAGER]}, roleTypesConfigs={}, minCount=1}, gateway=VirtualInstanceGroup{name='gateway', virtualInstances=[VirtualInstance{id='9780aea8-589a-448e-857f-a0b0b22dd18d', template=InstanceTemplate{name='cloudera-template1', type='m4.large', image='ami-02e98f78', rackId='/default', bootstrapScriptsArePresent=false, config={subnetId=subnet-b0e8de9d, ebsOptimized=false, tenancy=default, rootVolumeSizeGB=25, ebsVolumeCount=1, enableEbsEncryption=false, rootVolumeType=gp2, instanceNamePrefix=director, ebsVolumeSizeGiB=25, useSpotInstances=false, ebsVolumeType=gp2, securityGroupsIds=sg-d37be5b7}, tags={}, normalizeInstance=true, sshUsername=Optional.of(centos), sshHostKeyRetrievalType=NONE}}], serviceTypeToRoleTypes={HDFS=[GATEWAY], HIVE=[GATEWAY], YARN=[GATEWAY], SPARK_ON_YARN=[GATEWAY]}, roleTypesConfigs={}, minCount=1}}, externalDatabaseTemplates={}, externalDatabases={}, parcelRepositories=[http://archive.cloudera.com/cdh5/parcels/5.13/, http://archive.cloudera.com/kafka/parcels/3.0/], restartClusterOnUpdate=false, redeployClientConfigsOnUpdate=false, numberOfInstancePostCreateScripts=0, numberOfPostCreateScripts=0, numberOfPreTerminateScripts=0, migrations=0}], jobContext=JobContext{callCountAtThisStackLevel=0, pipelineHandle='3d0e4664-41ab-47b5-9119-e84bb5055363', callStack=CallStack{items=[Item{className='com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob', callCount=8}], size=1, parent=Optional.absent()}, stackLevel=1}, errorInfo=ErrorInfo{code=CLUSTER_DEPLOYMENT_IN_WRONG_STAGE, properties={currentStage=BOOTSTRAP_FAILED, deploymentName=ClouderaManager, environmentName=cloudera1}, causes=[]}}
[2017-12-12 21:18:48.693 +0000] ERROR [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.DatabasePipelineRunner: Pipeline '3d0e4664-41ab-47b5-9119-e84bb5055363' failed
[2017-12-12 21:18:48.698 +0000] INFO [p-e84bb5055363-DefaultBootstrapClusterJob] db1ee82c-f036-40c7-87aa-51e51f50c18b POST /api/v10/environments/cloudera1/deployments/ClouderaManager/clusters com.cloudera.launchpad.api.jobs.DefaultBootstrapClusterJob$WaitUntilDeploymentIsReady - c.c.l.p.s.PipelineRepositoryService: Pipeline '3d0e4664-41ab-47b5-9119-e84bb5055363': RUNNING -> ERROR It seems the Manager Installs, and i can even see the Director log attempting to connect to the Agent and failing (whilst the Agent is installing) then connecting on Port 7180. I really have no idea why thus us failing and have tried EVERY solution i have found including testing hostname and hostname -f match. This is in AWS running Director 2.6.1. Deploying into an existing VPC and Subnet. No SELinux/IPTables on the Manager Box. The issues seem to be between the Agent and the Manager on the same server. Any advice would be greatly appreciated. Cheers Andy
... View more
Labels:
- Labels:
-
Cloudera Manager