Member since
11-16-2014
41
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1969 | 11-17-2014 02:34 PM |
11-17-2014
04:42 PM
I was not aware that director would only use ephemeral storage, and I don't believe that was ever mentioned in the documentation. This will be a long lived cluster for on-going jobs that require several terabytes worth of storage.
... View more
11-17-2014
04:40 PM
Do I need to restart the director service after changing this? What would happen if I tried adding the nodes back in via the manager 'add new hosts to cluster'?
... View more
11-17-2014
04:31 PM
Is there anyway to recover instances that were 'suspended due to failure'?
... View more
11-17-2014
04:28 PM
Getting timeouts and 'suspended due to failure' message while resizing root file system for large (1TB) volumes. This probably deserves a much higher timeout especially considering that 16TB volumes will be availble soon. [2014-11-18 00:22:20] INFO [io-thread-9] - ssh:172.31.12.253: resize2fs 1.41.12 (17-May-2010) [2014-11-18 00:22:20] INFO [io-thread-9] - ssh:172.31.12.253: Filesystem at /dev/xvde is mounted on /; on-line resizing required [2014-11-18 00:22:20] INFO [io-thread-9] - ssh:172.31.12.253: old desc_blocks = 59, new_desc_blocks = 64 [2014-11-18 00:22:20] INFO [io-thread-9] - ssh:172.31.12.253: Performing an on-line resize of /dev/xvde to 268435456 (4k) blocks. [2014-11-18 00:22:20] INFO [io-thread-9] - ssh:172.31.12.253: resize2fs: Device or resource busy While trying to extend the last group [2014-11-18 00:22:20] ERROR [pipeline-thread-23] - c.c.l.p.DatabasePipelineRunner: Attempt to execute job failed com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo resize2fs $(sudo mount | grep "on / type" | awk '{ print $1 }') at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:47) ~[launchpad-pipeline-common-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner$1.call(DatabasePipelineRunner.java:229) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.attemptMultipleJobExecutionsWithRetries(DatabasePipelineRunner.java:213) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:132) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.6.0_33] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.6.0_33] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.6.0_33] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) ~[na:1.6.0_33] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.6.0_33] at java.lang.Thread.run(Thread.java:701) ~[na:1.6.0_33]
... View more
11-17-2014
02:34 PM
1 Kudo
Fixed by picking a different AMI, the original AMI had multiple partitions on the root disk which the director install did not appreciate. Switching to an AMI with a single partition on the root volume appears to have resolved this issue entirely. FWIW the picking the official centos AMI should ensure you dont have a bad time.
... View more
11-17-2014
01:57 PM
After launching a new manager node I discovered the root volume was not resized to the provisioned limit. # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 250G 0 disk ├─xvda1 202:1 0 200M 0 part /boot ├─xvda2 202:2 0 4G 0 part [SWAP] └─xvda3 202:3 0 10.8G 0 part /
... View more
Labels:
- Labels:
-
Cloudera Manager
11-17-2014
11:33 AM
The director actually doesn't perform terribly on a t2.micro and I don't recall getting any exceptions or OOM errors. The manager and cluster nodes were the issue. We're already running a CDH 5 cluster with Spark. I'm interested in using director as a replacement to our current methods of creating/deploying/scaling/managing clusters. Will try creating a more realistic cluster now.
... View more
11-17-2014
08:11 AM
As I understand it 7180 is the manager port and the IP im seeing in the logs corresponds with the manager IP. All of my security groups are setup such that all cluster instances are allowed to talk to the director and manager on any port. After creating another cluster as a test (I didn't want to leave my instance running over night) I've run in to the same issue. Logging in to the manager IP address and checking netstat (`netstat -tulpn`) for open connections I do not see anything listening on port 7180. Here is the relevant netstat output for the director and manager nodes. Director: # sudo netstat -tulpn | grep java tcp 0 0 0.0.0.0:7189 0.0.0.0:* LISTEN 13081/java Manager: # sudo netstat -tulpn | grep java tcp 0 0 0.0.0.0:7186 0.0.0.0:* LISTEN 11280/java tcp 0 0 0.0.0.0:7187 0.0.0.0:* LISTEN 4119/java tcp 0 0 0.0.0.0:10101 0.0.0.0:* LISTEN 4097/java tcp 0 0 0.0.0.0:8089 0.0.0.0:* LISTEN 11280/java On the manager node I've noticed that the cloudera-scm-server process does not appear to start correctly. # /etc/init.d/cloudera-scm-server status cloudera-scm-server dead but pid file exists After restarting this service and waiting several minutes the port finally appears open in netstat. I will attempt to launch another cluster now that I have verified the manager service is listening. If I can provide any further logs or details please let me know. For what its worth I am launching these instances as t2.micro since I'm trying to evaluate the product and I'm hesitant to spend more money launching test clusters until I know that it works.
... View more
11-16-2014
09:53 PM
I guess I spoke too soon. The manager is deployed according to the UI, but ther is no process listening on port 7180 so connecting to the manager fails. Futhermore bootstraping a new cluster also fails with the following error: [2014-11-17 05:47:31] ERROR [pipeline-thread-38] - c.c.l.p.DatabasePipelineRunner: Attempt to execute job failed javax.ws.rs.client.ClientException: org.apache.cxf.interceptor.Fault: Could not send Message. at org.apache.cxf.jaxrs.client.AbstractClient.checkClientException(AbstractClient.java:548) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] at org.apache.cxf.jaxrs.client.AbstractClient.preProcessResult(AbstractClient.java:534) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:545) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] at org.apache.cxf.jaxrs.client.ClientProxyImpl.invoke(ClientProxyImpl.java:206) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] at com.sun.proxy.$Proxy151.hostInstallCommand(Unknown Source) ~[na:na] at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:91) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:81) ~[launchpad-bootstrap-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.0.1.jar!/:1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner$1.call(DatabasePipelineRunner.java:229) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.attemptMultipleJobExecutionsWithRetries(DatabasePipelineRunner.java:213) ~[launchpad-pipeline-database-1.0.1.jar!/ :1.0.1] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:132) ~[launchpad-pipeline-database-1.0.1.jar!/:1.0.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.6.0_33] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.6.0_33] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.6.0_33] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) ~[na:1.6.0_33] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.6.0_33] at java.lang.Thread.run(Thread.java:701) ~[na:1.6.0_33] Caused by: org.apache.cxf.interceptor.Fault: Could not send Message. at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:64) ~[cxf-api-2.7.5.jar!/:2.7.5] at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:271) ~[cxf-api-2.7.5.jar!/:2.7.5] at org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:607) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] at org.apache.cxf.jaxrs.client.ClientProxyImpl.doChainedInvocation(ClientProxyImpl.java:543) ~[cxf-rt-frontend-jaxrs-2.7.5.jar!/:2.7.5] ... 16 common frames omitted Caused by: java.net.ConnectException: ConnectException invoking http://172.31.9.89:7180/api/v6/cm/commands/hostInstall: Connection refused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.6.0_33] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.6.0_33] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.6.0_33] at java.lang.reflect.Constructor.newInstance(Constructor.java:534) ~[na:1.6.0_33] at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.mapException(HTTPConduit.java:1338) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1322) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56) ~[cxf-api-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:622) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62) ~[cxf-api-2.7.5.jar!/:2.7.5] ... 19 common frames omitted Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.6.0_33] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) ~[na:1.6.0_33] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) ~[na:1.6.0_33] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) ~[na:1.6.0_33] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) ~[na:1.6.0_33] at java.net.Socket.connect(Socket.java:546) ~[na:1.6.0_33] at sun.net.NetworkClient.doConnect(NetworkClient.java:173) ~[na:1.6.0_33] at sun.net.www.http.HttpClient.openServer(HttpClient.java:409) ~[na:1.6.0_33] at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) ~[na:1.6.0_33] at sun.net.www.http.HttpClient.<init>(HttpClient.java:240) ~[na:1.6.0_33] at sun.net.www.http.HttpClient.New(HttpClient.java:321) ~[na:1.6.0_33] at sun.net.www.http.HttpClient.New(HttpClient.java:338) ~[na:1.6.0_33] at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935) ~[na:1.6.0_33] at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876) ~[na:1.6.0_33] at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801) ~[na:1.6.0_33] at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:979) ~[na:1.6.0_33] at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.setupWrappedStream(URLConnectionHTTPConduit.java:168) ~[cxf-rt-transports-http-2. 7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleHeadersTrustCaching(HTTPConduit.java:1282) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.onFirstWrite(HTTPConduit.java:1233) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.onFirstWrite(URLConnectionHTTPConduit.java:195) ~[cxf-rt-transports-http-2.7.5.ja r!/:2.7.5] at org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:47) ~[cxf-api-2.7.5.jar!/:2.7.5] at org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69) ~[cxf-api-2.7.5.jar!/:2.7.5] at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1295) ~[cxf-rt-transports-http-2.7.5.jar!/:2.7.5] ... 22 common frames omitted
... View more
11-16-2014
09:46 PM
Deploying centos 6.4 seems to be working as expected. Do you know when ubuntu will be supported?
... View more
- « Previous
-
- 1
- 2
- Next »