About jclough

jclough · ‎12-09-2014

So turns out this works. As usual it was an operator error. What I should have done was put the placementGroup keyword inside the instance container. Once I moved the setting into the worker instance container, rerunning the bootstrap created 4 workers with the correct placementGroup setting. Here is what I did (compare it to the original pasted info): instance: ${instances.cc2xl} { tags { group: worker Name: "Hadoop Worker-XX" } placementGroup: "AWS-PLACEMENT-GROUP-us-west-2-11-13-2014" } In hindsight, makes perfect sense. I didn't think this out logically when first creating the configuration

jclough · ‎12-08-2014

Thanks for the feedback. Yes, I visited that link when I started researching the use of the placmentGroup tag. My instance is one of the compute type instances the link says is supported, as I mentioned in my rambling opening post.

jclough · ‎12-06-2014

I have created a multi-node cluster using CD v1.0.1 and now find that the size of the cluster is causing data timeouts during Hadoop benchmarking runs. One of our AWS experienced engineer pointed out that the lack of a placementGroup on the workers is probably what is causing us problems. I thought when I went through the CD instalation instructions that the placement group it created would be applied to my cluster but it wasn't. I am modiying my aws.conf file to explicitly call out the AWS-PLACEMENT-GROUP CD created and attempting to create a new cluster. However the workers still come up with not Placement Group associated to with them. They are cc2.8xlarge and the AWS on-line docs says this instance should work with placement groups. I tried setting my conf file like so, per the doc spelling: workers { count: 48 # # Minimum number of instances required to set up the cluster. # Fail and quit if minCount number of instances is not available in this cloud # environment. Else, continue setting up the cluster. # minCount: 24 instance: ${instances.cc2xl} { tags { group: worker name: "Hadoop Worker" } } roles { HDFS: ${roles.HDFS_WORKERS} YARN: ${roles.YARN_WORKERS} HBASE: ${roles.HBASE_WORKERS} } placementGroup: "AWS-PLACEMENT-GROUP-us-west-2-11-13-2014" }

jclough · ‎11-14-2014

I don't have much to offer as far as improvements. I would like to see a way to reset or clean-up from failed updates or other staging failures. Case in point. When I did my final build, I used a smaller number of workers in the bootstrap stage, to make sure a working cluster would come up. I banked that the update would work to bring in the second half of the required workers. During the distributing phase, my final 24 nodes got stuck at 60% for a few hours. I went into Cloudera and tweaked the settings for number of parallel instances being updated with the CHD packages and in the CM, the package state went to 100%, activated. However the CLI UI on the Cloudera Director node doing the update, it never recovered so I CTRL-C'd it . I was able to verify the cluster had all my instances and I was able to manually update the roles on the new workers. However, now when I go to the Clouder Director node to run cloudera-director status aws.reference.conf to get that nice print-out, the director remembers that the last state was terminated in update state and doesn't give me status but rather seems like it wants to continue to success for the update. It would be nice to be able to clear the state so that I could do the status call if I wanted to. An enhancement would be if the update could assign the roles designated in the conf file, but I gather there are reasons it was decided to not attempt to automate this. Having to go through the CM to add 3 roles to a large cluster can be tricky. That's all I got. Minor nitpicks. I am not sure how folks plan to use this in production, since I used it for a single cluster. It will be something I keep in mind if I have to build DIY clusters for myself or my company. I would imagine those folks who want to script building CM clusters on AWS for dev teams, this is a great way to get repeatable configs out to various groups within their company for whatever projects their developers are on. I doubt once you have the process down, if the above nits I pointed out would be of much value but you did ask.

jclough · ‎11-13-2014

As a final follow-up, I ended up coming back to the Cloudera Director for my cluster build. Just a better, more polished way to create a cluster vs my DIY stitching together of instances. I was able to use the CLI to build a 48 worker node cluster (total 55 instances in all, since I added an extra instance to run the 3rd zookeeper service). I used the cc2.8xlarge as the worker instances. This cluster will not be a 24/7 config. More like 4 hrs max to run performance suites, stopping and starting on demand then retiring after next week. Thanks for the assistance and really useful product for a novice like me. Enlightening experience.

jclough · ‎11-12-2014

The WEB UI page didn't seem to have any history other than canceled, failed due to suspended task but trace to see. Management wants me to punt on director and go to the build it on your own without wizards route. Thanks anyway for the help. Time constraints will not let me follow-up.

jclough · ‎11-12-2014

I was running the CLI on this smaller case. I will try re-creating this in the Web UI to get more info on what is failing. Yeah, that BEESWAX error is new. I do have the dump file but I guess not having support there is no way to create a case to attach it to. Thanks, I will try your suggestion in the Web UI.

jclough · ‎11-12-2014

So going back to the RHEL ami image I selected originally for the m2.4xlarge type (ami-b8a63b88), here is how the install goes: Cloudera Director 1.0.1 initializing ... Installing Cloudera Manager ... * Starting ...... done * Requesting an instance for Cloudera Manager ................... done * Running custom bootstrap script on xxxxxxx ...... done * Inspecting capabilities of xxxxxxx ............... done * Normalizing xxxxxxx .... done * Installing ntp (1/2) .... done * Installing curl (2/2) ..................... done * Mounting all instance disk drives ........ done * Resizing instance root partition ..... done * Rebooting 10.0.1.178 ..... done * Waiting for 10.0.1.178 to boot ....... done * Installing repositories for Cloudera Manager ....... done * Installing jdk (1/4) ..... done * Installing cloudera-manager-daemons (2/4) .... done * Installing cloudera-manager-server (3/4) .... done * Installing cloudera-manager-agent (4/4) ....... done * Installing cloudera-manager-server-db-2 (1/1) .... done * Starting embedded PostgreSQL database ....... done * Starting Cloudera Manager server .... done * Waiting for Cloudera Manager server to start .... done * Configuring Cloudera Manager .... done * Starting Cloudera Management Services ..... done * Inspecting capabilities of 10.0.1.178 ........ done * Done ... Cloudera Manager ready. Creating cluster Product-Team-AWS ... * Starting ...... done * Requesting 9 instance(s) in 4 group(s) ......................... done * Preparing instances in parallel (20 at a time) ............................................................. done * Installing Cloudera Manager agents on all instances in parallel (20 at a time) .......... done * Creating CDH5 cluster using the new instances ... done * Creating cluster: Product-Team-AWS ... done * Downloading parcels: CDH-5.2.0-1.cdh5.2.0.p0.36 ... done * Distributing parcels: CDH-5.2.0-1.cdh5.2.0.p0.36 .... done * Activating parcels: CDH-5.2.0-1.cdh5.2.0.p0.36 .... done * Invalid role type(s) specified. Ignored during role creation: HUE: BEESWAX_SERVER ... done * Creating Hive Metastore Database ... done * Waiting on First Run command .... done * Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. ... Here are the final errors: [2014-11-12 21:10:44] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Attempt to execute job failed com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. at com.cloudera.launchpad.bootstrap.deployment.UnboundedWaitForApiCommand.run(UnboundedWaitForApiCommand.java:86) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.bootstrap.deployment.UnboundedWaitForApiCommand.run(UnboundedWaitForApiCommand.java:46) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner$1.call(DatabasePipelineRunner.java:229) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.attemptMultipleJobExecutionsWithRetries(DatabasePipelineRunner.java:213) [launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:132) [launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] [2014-11-12 21:10:44] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Encountered an unrecoverable error com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. at com.cloudera.launchpad.bootstrap.deployment.UnboundedWaitForApiCommand.run(UnboundedWaitForApiCommand.java:86) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.bootstrap.deployment.UnboundedWaitForApiCommand.run(UnboundedWaitForApiCommand.java:46) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner$1.call(DatabasePipelineRunner.java:229) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.attemptMultipleJobExecutionsWithRetries(DatabasePipelineRunner.java:213) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:132) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] [2014-11-12 21:10:44] INFO [pipeline-thread-1] - c.c.l.p.s.PipelineRepositoryService: Pipeline '58692a48-c684-4b4e-8d24-1c0103501d67': RUNNING -> ERROR [2014-11-12 21:10:45] INFO [pipeline-thread-1] - c.c.l.d.ClusterRepositoryService: Cluster 'Product-Team-AWS': BOOTSTRAP_FAILED -> BOOTSTRAP_FAILED [2014-11-12 21:10:45] INFO [Thread-2] - o.s.c.a.AnnotationConfigApplicationContext: Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@3274ae84: startup date [Wed Nov 12 20:40:47 EST 2014]; root of context hierarchy [2014-11-12 21:10:45] INFO [Thread-2] - o.s.o.j.LocalContainerEntityManagerFactoryBean: Closing JPA EntityManagerFactory for persistence unit 'default'

jclough · ‎11-12-2014

So I opted to use the Cloudera Director WEB UI with a smaller problem size. The older generation instance just loops forever waiting for completion. Took 15 mins when using c3.8xlarge. You can see here a snippet of the log file: [2014-11-12 19:35:38] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:38] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:39] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.165] [2014-11-12 19:35:39] INFO [pipeline-thread-18] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.165 [2014-11-12 19:35:39] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:39] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:40] INFO [pipeline-thread-18] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1083, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:37 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:40] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:40] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:41] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:41] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.166] [2014-11-12 19:35:42] INFO [pipeline-thread-15] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.166 [2014-11-12 19:35:42] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:42] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:42] INFO [pipeline-thread-15] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1085, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:40 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:43] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:43] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:43] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:44] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:44] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:44] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.167] [2014-11-12 19:35:44] INFO [pipeline-thread-14] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.167 [2014-11-12 19:35:44] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:44] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:44] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:44] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.168] [2014-11-12 19:35:45] INFO [pipeline-thread-17] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.168 [2014-11-12 19:35:45] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:45] INFO [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1087, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:43 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:45] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:47] INFO [pipeline-thread-17] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1089, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:43 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:52] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:53] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:53] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:54] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.169] [2014-11-12 19:35:54] INFO [pipeline-thread-16] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.169 [2014-11-12 19:35:54] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:54] INFO [pipeline-thread-16] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:54] INFO [pipeline-thread-16] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1091, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:52 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:55] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:55] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:56] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:56] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.165] [2014-11-12 19:35:57] INFO [pipeline-thread-18] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.165 [2014-11-12 19:35:57] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:57] INFO [pipeline-thread-18] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:57] INFO [pipeline-thread-18] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1093, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:55 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:35:58] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:58] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:35:59] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:59] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.166] [2014-11-12 19:35:59] INFO [pipeline-thread-15] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.166 [2014-11-12 19:35:59] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:35:59] INFO [pipeline-thread-15] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:36:00] INFO [pipeline-thread-15] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1095, name=GlobalHostInstall, startTime=Wed Nov 12 19:35:58 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:36:00] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:36:00] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:36:01] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:36:02] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> SetStatusJob/1 [Waiting for Cloudera Manager to deploy agent on 10.0.1.167] [2014-11-12 19:36:02] INFO [pipeline-thread-14] - com.cloudera.launchpad.pipeline.Job: Waiting for Cloudera Manager to deploy agent on 10.0.1.167 [2014-11-12 19:36:02] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:36:02] INFO [pipeline-thread-14] - c.c.l.p.DatabasePipelineRunner: >> WaitForSuccessOrRetryOnFailure/4 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... [2014-11-12 19:36:02] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: << None{} [2014-11-12 19:36:02] INFO [pipeline-thread-14] - c.c.l.b.d.UnboundedWaitForApiCommand: Waiting for ApiCommand{id=1097, name=GlobalHostInstall, startTime=Wed Nov 12 19:36:00 EST 2014, endTime=null, active=true, success=null, resultMessage=null, serviceRef=null, roleRef=null, hostRef=null, parent=null} [2014-11-12 19:36:03] INFO [pipeline-thread-17] - c.c.l.p.DatabasePipelineRunner: >> HostInstall/3 [CreateClusterContext{environment=Environment{name='Cray-Product Team 48 Node AWS Cluster', provider ... ^C The WEB UI is sitting here: Status Performance-Cluster Bootstrapping 2173 / 2184 Installing Cloudera Manager agents on all instances in parallel (20 at a time) Waiting for Cloudera Manager to deploy agent on 10.0.1.165 Waiting for Cloudera Manager to deploy agent on 10.0.1.168 Waiting for Cloudera Manager to deploy agent on 10.0.1.169 Waiting for Cloudera Manager to deploy agent on 10.0.1.166 Waiting for Cloudera Manager to deploy agent on 10.0.1.167

jclough · ‎11-12-2014

I have been attempting to create a cluster using Cloudera Director v1.0.1. I was able to take the distributed aws.reference.conf generated by the Cloud Formation Launch Cloudera EDH template and build a 48 worker cluster using the type:c3.8xlarge image: ami-18a23f28. Our performance team absolutely will not allow the use of SSD in this cluster, so after looking around and speaking with AWS support, I chose this as my worker nodes: type: m2.4xlarge image: ami-f032acc0 I first tried RHEL 6.4 PVM image ami-b8a63b88 but it didn't work, so I took the AWS suggested image ami-f032acc0 but it didn't work either. Didn't work means, the host installs worked but the Preparing instances and deploying of CM agents was stuck for 3 hrs before I gave up. For the current generation c3.8xlarge I can get the bootstrap to complete in an hour and a half tops. I have gone through the build/terminate stage several times. So my question after this long-winded data dump is are the older instances that supported magnetic storage supported using the Director? Hope my description is clear. Thanks for any help.

Online	Offline
Last Visited	‎12-09-2014 06:36 PM

Member Since	‎11-12-2014 07:46 AM
Last Visited	‎12-09-2014 06:36 PM
Posts	10
Kudos received	3

Cloudera Community

Re: placementGroup doesn't work for me using CD 1....

Re: Cannot complete client bootstrap using m2.4xla...

Re: placementGroup doesn't work for me using CD 1....

Re: placementGroup doesn't work for me using CD 1....

placementGroup doesn't work for me using CD 1.0.1

Re: Cannot complete client bootstrap using m2.4xla...

Re: Cannot complete client bootstrap using m2.4xla...

Re: Cannot complete client bootstrap using m2.4xla...

Re: Cannot complete client bootstrap using m2.4xla...

Re: Cannot complete client bootstrap using m2.4xla...

Re: Cannot complete client bootstrap using m2.4xla...

Cannot complete client bootstrap using m2.4xlarge