Support Questions

Find answers, ask questions, and share your expertise

UPDATE_FAILED and fixing inconsistency

avatar
Explorer

I'm trying to scale up/down a cluster using the cloudera director Python SDK and I've managed to get Director into a bad state. 

 
I was able to create/remove instances but using the SDK, but after creating trying to use my own AMI I got this error:
java.lang.IllegalArgumentException: Multiple entries with same key: PluggableComputeInstance{ipAddress=Optional.of(10.12.16.123), delegate=null, hostEndpoints=[HostEndpoint{hostAddressStrin
g='10.12.16.123', hostAddress=Optional.of(/10.12.16.123)}, HostEndpoint{hostAddressString='ip-10-12-16-123.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='54.175.219.24', hostAddress=Optional.of(/54.175.219.24)}, HostEndpoint{hostAddressString='ec2-54-175-219-24.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='0b360665-d081-4ad3-9fa1-c707a6a1cc70', template=InstanceTemplate{name='computec4', type='c3.8xlarge', image='ami-f31651e4', bootstrapScriptIsPresent=false, config={subnetId=subnet-41554d19, instanceNamePrefix=drona, securityGroupsIds=sg-fbe35c81}, tags={}, normalizeInstance=true, sshUsername=Optional.absent()}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.of(ORACLE), javaVersion=Optional.of(1.6.0_31), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=false, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-10-12-16-123.ec2.internal), clouderaManagerAgentInstalled=true, customScriptPaths={}})}=ApiHostRef{hostId=93f68064-bc41-4ba1-91e9-476c679821f9} and PluggableComputeInstance{ipAddress=Optional.of(10.12.16.123), delegate=null, hostEndpoints=[HostEndpoint{hostAddressString='10.12.16.123', hostAddress=Optional.of(/10.12.16.123)}, HostEndpoint{hostAddressString='ip-10-12-16-123.ec2.internal', hostAddress=Optional.absent()}, HostEndpoint{hostAddressString='54.175.219.24', hostAddress=Optional.of(/54.175.219.24)}, HostEndpoint{hostAddressString='ec2-54-175-219-24.compute-1.amazonaws.com', hostAddress=Optional.absent()}]} Instance{virtualInstance=VirtualInstance{id='0b360665-d081-4ad3-9fa1-c707a6a1cc70', template=InstanceTemplate{name='computec4', type='c3.8xlarge', image='ami-f31651e4', bootstrapScriptIsPresent=false, config={subnetId=subnet-41554d19, instanceNamePrefix=drona, securityGroupsIds=sg-fbe35c81}, tags={}, normalizeInstance=true, sshUsername=Optional.absent()}}, capabilities=Optional.of(Capabilities{operatingSystemType=REDHAT_COMPATIBLE, operatingSystemVersion=REDHAT_COMPATIBLE_7, virtualizationType=HARDWARE_ASSISTED, packageManager=Optional.of(YUM), javaVendor=Optional.of(ORACLE), javaVersion=Optional.of(1.6.0_31), pythonVersion=Optional.of(2.7.5), passwordlessSudoEnabled=true, selinuxEnabled=false, iptablesEnabled=false, dnsConfigured=true, fqdn=Optional.of(ip-10-12-16-123.ec2.internal), clouderaManagerAgentInstalled=true, customScriptPaths={}})}=ApiHostRef{hostId=c04b3bab-d0fe-42c4-bf77-40540d8d8298}
        at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150) ~[guava-15.0.jar!/:na]
        at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104) ~[guava-15.0.jar!/:na]
        at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70) ~[guava-15.0.jar!/:na]
        at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254) ~[guava-15.0.jar!/:na]
        at com.cloudera.launchpad.bootstrap.cluster.util.BootstrapClusterUtils.getInstanceToApiHostRef(BootstrapClusterUtils.java:100) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.util.BootstrapClusterUtils.findHostRefsForInstances(BootstrapClusterUtils.java:64) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddInstancesToCluster.run(AddInstancesToCluster.java:45) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddInstancesToCluster.run(AddInstancesToCluster.java:28) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job2.runUnchecked(Job2.java:31) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job2$$FastClassBySpringCGLIB$$54178502.invoke(<generated>) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.2.jar!/:3.1.2]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at sun.reflect.GeneratedMethodAccessor148.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_60]
        at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_60]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
 
 
I think this was caused by creating an instance group with the same name as an existing one, but that's just a guess (and this is not my question). After this when I look in Cloudera Director the status of the cluster is "update failed" and when I try to run my python script the status of the update command returns as "UPDATE_FAILED"
 
 
where I log into the CRaSH shell, but when I try to repair the cluster it tells me I have to (somehow) import changes that have been made in Cloudera Manager.
 
> clusters reconcile "the env" "the deployment" "the cluster" true
Attempting to reconcile the cluster
Verifying Director cluster is consistent with Cloudera Manager

Inconsistency warnings detected with Cloudera Manager and Director:
        Role Type HBASETHRIFTSERVER found in Cloudera Manager but not Cloudera Director for Service Type HBASE
        Role Type GATEWAY found in Cloudera Manager but not Cloudera Director for Service Type SPARK_ON_YARN
        Role types not specified in Cloudera Director for service: STREAMSETS
        Service found in Cloudera Manager but not in Cloudera Director: STREAMSETS

Inconsistency errors detected with Cloudera Manager and Director:
        Host found in Cloudera Director but not in Cloudera Manager with IP address: 127.0.0.1
        Role Type GATEWAY found in Cloudera Director but not Cloudera Manager for Service Type HDFS
        Role Type GATEWAY found in Cloudera Director but not Cloudera Manager for Service Type HBASE

 Can not set Cluster to READY until the errors are resolved

Also after adding the node failed I terminated the instance (since CM didn't know anything about it I figured Director wouldn't care). I'm assuming that's causing the Gateway inconsistency errors. If that were true though I'd also expect to see YARN since the node I was trying to add had NodeManager and HDFS/HBase gateways.
 
What do I need to do to fix these inconsistencies? 
 
Thanks,
Tony
1 ACCEPTED SOLUTION

avatar
Explorer

Roles don't sync back to Cloudera Director, I was able to fix this by removing the Gateway roles I had added in CM then running the reconcile.

View solution in original post

1 REPLY 1

avatar
Explorer

Roles don't sync back to Cloudera Director, I was able to fix this by removing the Gateway roles I had added in CM then running the reconcile.