Reply
Explorer
Posts: 7
Registered: ‎06-01-2018

cloudera director bootstrap failure: Cloudera Manager 'First Run' command execution faile

Hi,

I am new to Cloudera and am trying to use the Cloudera director to spin up an AWS cluster to test Apache Spot in accordance with the instructions given here -

 

https://blog.cloudera.com/blog/2018/02/apache-spot-incubating-and-cloudera-on-aws-in-60-minutes/

 

Having installed the Cloudera director according to the instructions mentioned above, the bootstrap command

"cloudera-director bootstrap spot-director.conf" fails to go through

 

Snapshot of the logs - 

 

* Requesting 5 instance(s) in 3 group(s) .................................... done
* Preparing instances in parallel (20 at a time) ......................................................................................................
....................... done
* Waiting for Cloudera Manager installation to complete .... done
* Installing Cloudera Manager agents on all instances in parallel (20 at a time) .............. done
* Waiting for new external database servers to start running ... done
* Creating CDH5 cluster using the new instances .... done
* Creating cluster: apache-spot ..... done
* Downloading parcels: SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957,KAFKA-2.2.0-1.2.2.0.p0.68,CDH-5.12.2-1.cdh5.12.2.p0.4 .... done
* Distributing parcels: KAFKA-2.2.0-1.2.2.0.p0.68,SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957,CDH-5.12.2-1.cdh5.12.2.p0.4 .... done
* Switching parcel distribution rate limits back to defaults: 51200KB/s with 25 concurrent uploads ... done
* Activating parcels: KAFKA-2.2.0-1.2.2.0.p0.68,SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957,CDH-5.12.2-1.cdh5.12.2.p0.4 ......... done
* Creating cluster services ... done
* Assigning roles to instances ... done
* Automatically configuring services and roles ... done
* Configuring HIVE database ... done
* Configuring OOZIE database .... done
* Creating Hive Metastore Database ... done
* Calling firstRun on cluster apache-spot ... done
* Waiting for firstRun on cluster apache-spot .... done
* Starting ... done
* Collecting diagnostic data .... done
* Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. ...

 

There are no relevant error messages in the director.log (except the one that says that first-run failed). This are some of the configurations in the environment file used by the cloudera director - 

 

 

------
cloudera-manager {

instance: ${instances.m44x} {
tags {
application: "CM5"
group: cm
}
}

csds: [
    "http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar",
   ]

javaInstallationStrategy: NONE

#
# Automatically activate 60-Day Cloudera Enterprise Trial
#

enableEnterpriseTrial: true

}

#
# Cluster description
#
cluster {

# List the products and their versions that need to be installed.
# These products must have a corresponding parcel in the parcelRepositories
# configured above. The specified version will be used to find a suitable
# parcel. Specifying a version that points to more than one parcel among
# those available will result in a configuration error. Specify more granular
# versions to avoid conflicts.

products {
CDH: 5.12,
SPARK2: 2.2,
KAFKA: 2
}

parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.12/",
"http://archive.cloudera.com/kafka/parcels/2.2/",
"http://archive.cloudera.com/spark2/parcels/2.2.0/"]

services: [HDFS, YARN, ZOOKEEPER, HIVE, IMPALA, KAFKA, SPARK2_ON_YARN, HUE, OOZIE]

masters {
count: 1
instance: ${instances.m44x} {
tags {
group: master
}
}

roles {
HDFS: [NAMENODE, SECONDARYNAMENODE]
YARN: [RESOURCEMANAGER, JOBHISTORY]
ZOOKEEPER: [SERVER]
HIVE: [HIVESERVER2, HIVEMETASTORE]
IMPALA: [CATALOGSERVER, STATESTORE]
SPARK2_ON_YARN: [SPARK2_YARN_HISTORY_SERVER]
HUE: [HUE_SERVER]
OOZIE: [OOZIE_SERVER]
KAFKA: [KAFKA_BROKER]
}
}
--------

 

Would someone be able to tell me how to debug this issue further? Are there any logs in the cloudera manager server node that I should look at to figure what went wrong?

 

Thanks

 

 

 

Explorer
Posts: 7
Registered: ‎06-01-2018

Re: cloudera director bootstrap failure: Cloudera Manager 'First Run' command execution faile

Found this message in the scm-server-log -

 

2018-06-10 15:26:45,324 INFO CommandPusher:com.cloudera.cmf.model.DbCommand: Command 66(First Run) has completed. finalstate:FINISHED, success:false, msg:Failed to perform First Run of services

 

In the scm-agent-log - 

 

[10/Jun/2018 15:22:36 +0000] 12790 MainThread agent ERROR Failed to connect to previous supervisor.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/agent.py", line 2136, in find_or_start_supervisor
self.configure_supervisor_clients()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.14.3-py2.7.egg/cmf/agent.py", line 2317, in configure_supervisor_clients
supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 1599, in realize
Options.realize(self, *arg, **kw)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 333, in realize
self.process_config()
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 341, in process_config
self.process_config_file(do_usage)
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 376, in process_config_file
self.usage(str(msg))
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 164, in usage
self.exit(2)
SystemExit: 2

 

Also - 

 

[10/Jun/2018 15:22:37 +0000] 12790 MainThread downloader   ERROR    Failed rack peer update: [Errno 111] Connection refused

Cloudera Employee
Posts: 66
Registered: ‎10-28-2014

Re: cloudera director bootstrap failure: Cloudera Manager 'First Run' command execution faile

sg321,

 

You do need to look in CM to find the root cause of the First Run failure.  The CM UI has a tab "All Recent Commands".

https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_dg_view_running_recent_commands.h...

 

First Run is composed of a series of steps. You can determine which step failed through the UI.

 

You can also find this information in the scm-server-log. Information about the step that failed should appear somewhere above the scm-server-log message that you posted.

 

David

Explorer
Posts: 7
Registered: ‎06-01-2018

Re: cloudera director bootstrap failure: Cloudera Manager 'First Run' command execution faile

Thanks David for the info!

 

I logged into the CM UI and tried to start the cluster again (to diagnose what was wrong). It appears that the job history (of service YARN-1) failed to start. Attaching a snapshot. 

 

In the role file, I see this message - 

 

Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://ip-172-31-30-54.us-west-1.compute.internal:8020/user/history/done]
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://ip-172-31-30-54.us-west-1.compute.internal:8020/user/history/done]
	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.tryCreatingHistoryDirs(HistoryFileManager.java:682)
	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.createHistoryDirs(HistoryFileManager.java:618)
	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:579)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:95)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:154)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:229)
	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:239)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:279)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:260)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:240)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:162)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3634)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3617)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3599)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6659)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4424)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4394)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4367)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:873)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:323)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:618)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3115)
	at org.apache.hadoop.fs.Hdfs.mkdir(Hdfs.java:311)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:737)
	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.tryCreatingHistoryDirs(HistoryFileManager.java:665)
	... 10 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:279)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:260)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:240)
	at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:162)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3634)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3617)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3599)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6659)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4424)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4394)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4367)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:873)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:323)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:618)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

Any recommendations on how to debug further?

 

Thanks

 


Screen Shot 2018-06-14 at 12.45.59 PM.png
Announcements