About da1shark

da1shark · ‎09-15-2015

Our security standards do not allow the OS type and version to be displayed when a user logons to the system. That means our AMI images do not have the OS/Version listed in the /etc/issue file (since that file is displayed on logon). Cloudera Director fails with the following error because the OS type is not listed in the/etc/issue file: java.lang.UnsupportedOperationException: Operating system type not supported: UNKNOWN The standard RHEL 6.5 ami image's /etc/issue file contains the following (cat /etc/issue): Red Hat Enterprise Linux Server release 6.5 (Santiago) So I'm trying to determine a work around so that Cloudera Director can get the OS type another way since it is not listed in our image's /etc/issue file. So my questions are: When does Cloudera Director's script read the /etc/issue file? Does it read the result of that file as it is displayed when connecting with SSH? Or does it read the /etc/issue file after it connects? Is there another option to skip the OS check or set something in our bootstrap script (that depends on when the file is read)? Thanks!

da1shark · ‎08-07-2015

Changing the disk type to gp2 looks to have solved the issue. I have recreated the cluster 2 times and it has not failed. Thanks!

da1shark · ‎08-03-2015

Yes, I used gp2 instead of "standard" and it completed ok. I ran the script Saturday morning, so I don't know if it was the change that I made or the fact that I ran it on the weekend (less activity). I will need to run the script several more times over the next week or two. If I encounter issues then I'll be sure to report back. But for now I'm hoping this change addressed my issue. Thanks!

da1shark · ‎07-30-2015

The root volumes are 200 GB: rootVolumeSizeGB: 200 # defaults to 50 GB if not specified rootVolumeType: standard # gp2 for SSD OR standard (for EBS magnetic) For the manager/name/edge nodes I am using this type (4 nodes): c32x { type: c3.2xlarge # requires an HVM AMI image: ami-00a11e68 tags { ... And for the data/yarn nodes I am using this type (2 nodes): d2x { #data node (3 drives of 2TB each) type: d2.xlarge # requires an HVM AMI image: ami-00a11e68 Yes, we have started up 100's of instances before (lots of EMR jobs), so capacity is not an issue. The nodes are running and I can ssh into all instances :-). The launcher seems to fail when starting up zookeeper. I can go to the Cloudera Manager (CM) and start zookeeper and it will start, but by that time the steps to configure the cluster has to be executed manually. In CM I can go to each service/role and run their tasks and the cluster starts ok. I follow all the steps listed in the run list - no other errors (I do get the warning that I need to change the replication factor from 3 (since I'm only starting up 2 hdfs nodes). I had that on my service config for HDFS but removed it thinking maybe it was causing an issue. Do I need to list roles in a specific order? It looks like Zookeeper is the first service that always starts up first, so I assume it configures CM using the API and then lists the tasks to run in the correct order.

da1shark · ‎07-30-2015

I am using the cloudera-director-client-1.1.3 to start up a small cluster. I am having issues with Zookeeper. I tried to using a single and 3 zookeeper nodes. Sometimes the script runs, but most of the time it fails. If I try to access the logs I get: [Errno 2] No such file or directory: '/var/log/zookeeper/zookeeper-cmf-CD-ZOOKEEPER-bsejIGrE-SERVER-ip-10-0-1-41.ec2.internal.log' If I look at the node the folder is empty. Here is the error that I think is causing the issue: Command ZkInit with ID 56 failed. resultMessage=Command aborted because of exception: Command timed-out after 90 seconds Is there a way to increase the timeout? Do I need to turn off the diagnostic data collection (could that be causing an issue)? Here is a snippet of the log file: [2015-07-30 14:40:09] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Creating cluster services [2015-07-30 14:40:09] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Assigning roles to instances [2015-07-30 14:40:09] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 11 roles for service CD-HDFS-sxHsiZxW [2015-07-30 14:40:09] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 6 roles for service CD-YARN-GqpLWLyB [2015-07-30 14:40:10] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 1 roles for service CD-ZOOKEEPER-bsejIGrE [2015-07-30 14:40:10] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 5 roles for service CD-HIVE-PJiYBqej [2015-07-30 14:40:10] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 1 roles for service CD-HUE-GatHLWtp [2015-07-30 14:40:10] INFO [pipeline-thread-1] - c.c.l.bootstrap.cluster.AddServices: Creating 1 roles for service CD-OOZIE-nQaBSyxZ [2015-07-30 14:40:11] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Automatically configuring services and roles [2015-07-30 14:40:11] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Applying custom configurations of services [2015-07-30 14:40:12] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Configuring Hive Metastore database .... [2015-07-30 14:40:25] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Creating Hive Metastore Database [2015-07-30 14:40:25] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: << None{} [2015-07-30 14:40:28] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: >> UnboundedWaitForApiCommand/3 [53, Deployment{name='dev-3 Deployment', hostname='10.0.1.90', port=7180, username='admin', man ... [2015-07-30 14:40:28] INFO [pipeline-thread-1] - c.c.l.b.UnboundedWaitForApiCommand: Command CreateHiveDatabase with ID 53 completed successfully. Details: ApiCommand{id=53, name=CreateHiveDatabase, startTime=Thu Jul 30 14:40:19 EDT 2015, endTime=Thu Jul 30 14:40:25 EDT 2015, active=false, success=true, resultMessage=Created Hive Metastore Database., serviceRef=ApiServiceRef{peerName=null, clusterName=dev-3, serviceName=CD-HIVE-PJiYBqej}, roleRef=null, hostRef=null, parent=null} [2015-07-30 14:40:28] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: << None{} [2015-07-30 14:40:30] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: >> SetStatusJob/1 [Waiting on First Run command] [2015-07-30 14:40:31] INFO [pipeline-thread-1] - c.c.launchpad.pipeline.AbstractJob: Waiting on First Run command I see an error about collecting diagnostic data&colon; .... 2015-07-30 14:43:20] INFO [pipeline-thread-1] - c.c.l.b.UnboundedWaitForApiCommand: Collecting and downloading diagnostic data [2015-07-30 14:43:21] ERROR [pipeline-thread-1] - c.c.l.b.ClouderaManagerLogRetriever: Got exception while collecting diagnostic data javax.ws.rs.ServiceUnavailableException: null And then it attempts to do the "first run" of the install: [2015-07-30 14:43:21] WARN [pipeline-thread-1] - c.c.l.b.UnboundedWaitForApiCommand: Failed to collect diagnostic data [2015-07-30 14:43:21] ERROR [pipeline-thread-1] - c.c.l.b.UnboundedWaitForApiCommand: Command First Run with ID 54 failed. Details: ApiCommand{id=54, name=First Run, startTime=Thu Jul 30 14:40:20 EDT 2015, endTime=Thu Jul 30 14:43:20 EDT 2015, active=false, success=false, resultMessage=Failed to perform First Run of services., serviceRef=null, roleRef=null, hostRef=null, parent=null} [2015-07-30 14:43:21] ERROR [pipeline-thread-1] - c.c.l.b.UnboundedWaitForApiCommand: Command ZkInit with ID 56 failed. Details: ApiCommand{id=56, name=ZkInit, startTime=Thu Jul 30 14:41:50 EDT 2015, endTime=Thu Jul 30 14:43:20 EDT 2015, active=false, success=false, resultMessage=Command aborted because of exception: Command timed-out after 90 seconds, serviceRef=ApiServiceRef{peerName=null, clusterName=dssh-dev-3, serviceName=CD-ZOOKEEPER-bsejIGrE}, roleRef=ApiRoleRef{clusterName=dssh-dev-3, serviceName=CD-ZOOKEEPER-bsejIGrE, roleName=CD-ZOOKEEPER-bsejIGrE-SERVER-95a522458bc9844f970bdffc8e1a5c6f}, hostRef=null, parent=null} [2015-07-30 14:43:21] ERROR [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. Here is the cluster description: # Cluster description cluster { products { CDH: 5 } parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.3.3/"] services: [HDFS, YARN, ZOOKEEPER, HIVE, HUE, OOZIE] configs { ZOOKEEPER { zookeeper_datadir_autocreate: true } } masters-1 { count: 1 instance: ${instances.c32x} { tags { group: master } } roles { HDFS: [NAMENODE, GATEWAY] } } masters-2 { count: 1 instance: ${instances.c32x} { tags { group: master } } roles { ZOOKEEPER: [SERVER] HDFS: [SECONDARYNAMENODE, GATEWAY] YARN: [RESOURCEMANAGER, JOBHISTORY] } } workers { count: 2 minCount: 2 instance: ${instances.d2x} { tags { group: worker } } roles { HDFS: [GATEWAY, DATANODE] HIVE: [GATEWAY] YARN: [NODEMANAGER, GATEWAY] } } gateways { count: 1 instance: ${instances.c32x} { tags { group: gateway } } roles { HIVE: [GATEWAY, HIVEMETASTORE, HIVESERVER2] HDFS: [GATEWAY, BALANCER, HTTPFS] HUE: [HUE_SERVER] OOZIE: [OOZIE_SERVER] } } }

da1shark · ‎07-09-2015

Have you consider using the Cloudera Manager API tools? http://cloudera.github.io/cm_api/docs/quick-start/ http://cloudera.github.io/cm_api/apidocs/v10/index.html We are using Chef to update CM so that our configuratons stay in sync. We can spin up a cluster and change settings but can still us CM to make other changes later on.

Member Since	‎06-25-2015 03:56 AM
Last Visited
Posts	8
Kudos received	1

Cloudera Community

Re: How to configure cloudera manager managed clus...

Deployment fails because the ami image does not co...

Re: zookeeper fails with "Failed to initialize Zoo...

Re: zookeeper fails with "Failed to initialize Zoo...

Re: zookeeper fails with "Failed to initialize Zoo...

zookeeper fails with "Failed to initialize ZooKeep...

Re: How to configure cloudera manager managed clus...