Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multinode docker based installation error

Highlighted

Multinode docker based installation error

Contributor

 

 

Below is output from my attempt to get a multi-node Docker/CDH 5.8.1 setup up and running on an AWS EC2 "m4.4xlarge" instance (16 x vCPU’s and 64GB of memory), running CentOS 7.2....

 

I used the following Cloudera URL as "instructions":   

 

http://blog.cloudera.com/blog/2016/08/multi-node-clusters-with-cloudera-quickstart-for-docker/

 

 

You can see all goes well until the "Exception: Timed out after waiting 10 minutes for services to start" (I'm guessing the python errors above this line are the cause).

 

It would be great if you could cast your eyes over the output 

 

 

 

Install Docker Engine:

 

[root@client2-dev-cdh-docker-launcher-instance ~]# yum install -y docker-engine

 

Loaded plugins: fastestmirror

Loading mirror speeds from cached hostfile

* base: mirror.ventraip.net.au

* epel: epel.mirror.digitalpacific.com.au

* extras: mirror.ventraip.net.au

* updates: mirror.ventraip.net.au

Resolving Dependencies

--> Running transaction check

---> Package docker-engine.x86_64 0:1.12.0-1.el7.centos will be installed

--> Processing Dependency: docker-engine-selinux >= 1.12.0-1.el7.centos for package: docker-engine-1.12.0-1.el7.centos.x86_64

--> Processing Dependency: libltdl.so.7()(64bit) for package: docker-engine-1.12.0-1.el7.centos.x86_64

--> Running transaction check

---> Package docker-engine-selinux.noarch 0:1.12.0-1.el7.centos will be installed

---> Package libtool-ltdl.x86_64 0:2.4.2-21.el7_2 will be installed

--> Finished Dependency Resolution

 

Dependencies Resolved

.....

.....

Installed:

  docker-engine.x86_64 0:1.12.0-1.el7.centos

 

Dependency Installed:

  docker-engine-selinux.noarch 0:1.12.0-1.el7.centos                                                         libtool-ltdl.x86_64 0:2.4.2-21.el7_2

 

Complete!

 

 

Start Docker Service:

 

[root@client2-dev-cdh-docker-launcher-instance ~]# service docker start

Redirecting to /bin/systemctl start  docker.service

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# systemctl | grep -i dock

 

sys-devices-virtual-net-docker0.device           loaded active plugged   /sys/devices/virtual/net/docker0

sys-subsystem-net-devices-docker0.device         loaded active plugged   /sys/subsystem/net/devices/docker0

var-lib-docker-devicemapper.mount                loaded active mounted   /var/lib/docker/devicemapper

docker.service                                   loaded active running   Docker Application Container Engine

 

 

Test/Launch the "hello world" container:

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker run hello-world

 

Unable to find image 'hello-world:latest' locally

latest: Pulling from library/hello-world

c04b14da8d14: Pull complete

Digest: sha256:0256e8a36e2070f7bf2d0b0763dbabdd67798512411de4cdcf9431a1feb60fd9

Status: Downloaded newer image for hello-world:latest

 

Hello from Docker!

This message shows that your installation appears to be working correctly.

 

To generate this message, Docker took the following steps:

1. The Docker client contacted the Docker daemon.

2. The Docker daemon pulled the "hello-world" image from the Docker Hub.

3. The Docker daemon created a new container from that image which runs the

    executable that produces the output you are currently reading.

4. The Docker daemon streamed that output to the Docker client, which sent it

    to your terminal.

 

To try something more ambitious, you can run an Ubuntu container with:

$ docker run -it ubuntu bash

 

Share images, automate workflows, and more with a free Docker Hub account:

https://hub.docker.com

 

For more examples and ideas, visit:

https://docs.docker.com/engine/userguide/

 

 

Check Docker Containers:

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker ps -a

 

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES

d73ebc648479        hello-world         "/hello"            18 seconds ago      Exited (0) 17 seconds ago                       compassionate_rosalind

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker rm d73ebc648479

d73ebc648479

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker ps -a

 

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

 

 

CURL the "clusterdock" script:

 

[root@client2-dev-cdh-docker-launcher-instance ~]# curl -sL http://tiny.cloudera.com/clusterdock.sh > clusterdock.sh

 

[root@client2-dev-cdh-docker-launcher-instance ~]# ls -lrt

 

-rwxrwxrwx 1 root root 5641 Aug 18 16:12 clusterdock.sh

 

 

Source the "clusterdock" script to setup environment:

 

Note:  I edited the "clusterdock.sh" script and added a "set -x" to it in order to see output....

 

[root@client2-dev-cdh-docker-launcher-instance ~]# source clusterdock.sh

 

++ printf '\033]0;%s@%s:%s\007' root client2-dev-cdh-docker-launcher-instance '~'

[root@client2-dev-cdh-docker-launcher-instance ~]#

++ printf '\033]0;%s@%s:%s\007' root client2-dev-cdh-docker-launcher-instance '~'

[root@client2-dev-cdh-docker-launcher-instance ~]#

++ printf '\033]0;%s@%s:%s\007' root client2-dev-cdh-docker-launcher-instance '~'

[root@client2-dev-cdh-docker-launcher-instance ~]#

++ printf '\033]0;%s@%s:%s\007' root client2-dev-cdh-docker-launcher-instance '~'

 

 

 

Now create a 4 node CDH cluster…..

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# clusterdock_run ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 --secondary-nodes='machine-{2..4}' --exclude-service-types=IMPALA

 

+ clusterdock_run ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 '--secondary-nodes=machine-{2..4}' --exclude-service-types=IMPALA

+ '[' -z '' ']'

+ local CONSTANTS_CONFIG_URL=https://raw.githubusercontent.com/cloudera/clusterdock/master/clusterdock/constants.cfg

++ curl -s https://raw.githubusercontent.com/cloudera/clusterdock/master/clusterdock/constants.cfg

++ awk -F ' *= *' '/^docker_registry_url/ {print $2}'

+ local DOCKER_REGISTRY_URL=docker.io

++ curl -s https://raw.githubusercontent.com/cloudera/clusterdock/master/clusterdock/constants.cfg

++ awk -F ' *= *' '/^cloudera_namespace/ {print $2}'

+ local CLOUDERA_NAMESPACE=cloudera

+ CLUSTERDOCK_IMAGE=docker.io/cloudera/clusterdock:latest

+ '[' '' '!=' false ']'

+ sudo docker pull docker.io/cloudera/clusterdock:latest

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ sudo docker run --net=host -t --privileged -v /tmp/clusterdock -v /etc/hosts:/etc/hosts -v /etc/localtime:/etc/localtime -v /var/run/docker.sock:/var/run/docker.sock docker.io/cloudera/clusterdock:latest ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 '--secondary-nodes=machine-{2..4}' --exclude-service-types=IMPALA

INFO:clusterdock.topologies.cdh.actions:Pulling image docker.io/cloudera/clusterdock:cdh580_cm581_primary-node. This might take a little while...

cdh580_cm581_primary-node: Pulling from cloudera/clusterdock

3eaa9b70c44a: Pull complete

99ba8e23f310: Pull complete

c9c08e9a0d03: Pull complete

7434a9a99daa: Pull complete

d52d9baa0ee6: Pull complete

00ca224ba661: Pull complete

Digest: sha256:9feffbfc5573262a6efbbb0a969efde890e63ced8a4ab3c9982f4f0dc607e429

Status: Downloaded newer image for cloudera/clusterdock:cdh580_cm581_primary-node

INFO:clusterdock.topologies.cdh.actions:Pulling image docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node. This might take a little while...

cdh580_cm581_secondary-node: Pulling from cloudera/clusterdock

3eaa9b70c44a: Already exists

99ba8e23f310: Already exists

c9c08e9a0d03: Already exists

7434a9a99daa: Already exists

d52d9baa0ee6: Already exists

f70deff0592f: Pull complete

Digest: sha256:251778378b362adff4e93b99d423848216e4823965dabd1bd4c41dbb4c79afcf

Status: Downloaded newer image for cloudera/clusterdock:cdh580_cm581_secondary-node

INFO:clusterdock.cluster:Network (client2-cdh-dev-cluster) not present, creating it...

INFO:clusterdock.cluster:Successfully setup network (name: client2-cdh-dev-cluster).

INFO:clusterdock.cluster:Successfully started machine-2.client2-cdh-dev-cluster (IP address: 192.168.123.3).

INFO:clusterdock.cluster:Successfully started machine-3.client2-cdh-dev-cluster (IP address: 192.168.123.4).

INFO:clusterdock.cluster:Successfully started machine-4.client2-cdh-dev-cluster (IP address: 192.168.123.5).

INFO:clusterdock.cluster:Successfully started machine-1.client2-cdh-dev-cluster (IP address: 192.168.123.2).

INFO:clusterdock.cluster:Started cluster in 6.81 seconds.

INFO:clusterdock.topologies.cdh.actions:Changing server_host to machine-1.client2-cdh-dev-cluster in /etc/cloudera-scm-agent/config.ini...

INFO:clusterdock.topologies.cdh.actions:Removing files (/var/lib/cloudera-scm-agent/uuid, /dfs*/dn/current/*) from hosts (machine-3.client2-cdh-dev-cluster, machine-4.client2-cdh-dev-cluster)...

INFO:clusterdock.topologies.cdh.actions:Restarting CM agents...

cloudera-scm-agent is already stopped

Starting cloudera-scm-agent: [  OK  ]

Stopping cloudera-scm-agent: [  OK  ]

Stopping cloudera-scm-agent: [  OK  ]

Stopping cloudera-scm-agent: [  OK  ]

Starting cloudera-scm-agent: [  OK  ]

Starting cloudera-scm-agent: [  OK  ]

Starting cloudera-scm-agent: [  OK  ]

INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online...

INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 35.04 seconds.

INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://client2-dev-cdh-docker-launcher-instance:32768

INFO:clusterdock.topologies.cdh.cm:Detected CM API v13.

INFO:clusterdock.topologies.cdh.cm_utils:Adding hosts (Ids: 484aa22a-44af-4593-b6d0-ad91806fa944, 3df146ff-5139-4196-962e-873a285fecb7) to Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.cm_utils:Creating secondary node host template...

INFO:clusterdock.topologies.cdh.cm_utils:Sleeping for 30 seconds to ensure that parcels are activated...

INFO:clusterdock.topologies.cdh.cm_utils:Applying secondary host template...

INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations...

INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...

INFO:clusterdock.topologies.cdh.actions:Removing service impala from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Deploying client configuration...

INFO:clusterdock.topologies.cdh.actions:Starting cluster...

INFO:clusterdock.topologies.cdh.actions:Starting Cloudera Management service...

INFO:clusterdock.topologies.cdh.cm:Beginning service health validation...

 

Traceback (most recent call last):

  File "./bin/start_cluster", line 70, in <module>

    main()

  File "./bin/start_cluster", line 63, in main

    actions.start(args)

  File "/root/clusterdock/clusterdock/topologies/cdh/actions.py", line 151, in start

    deployment.validate_services_started()

  File "/root/clusterdock/clusterdock/topologies/cdh/cm.py", line 91, in validate_services_started

    "(at fault: {1}).").format(timeout_min, at_fault_services))

Exception: Timed out after waiting 10 minutes for services to start (at fault: [[u'zookeeper', "Failed health checks: [u'ZOOKEEPER_SERVERS_HEALTHY']"], [u'hdfs', "Failed health checks: [u'HDFS_CANARY_HEALTH', u'HDFS_DATA_NODES_HEALTHY', u'HDFS_FREE_SPACE_REMAINING', u'HDFS_HA_NAMENODE_HEALTH']"], [u'hbase', "Failed health checks: [u'HBASE_MASTER_HEALTH', u'HBASE_REGION_SERVERS_HEALTHY']"], [u'solr', "Failed health checks: [u'SOLR_SOLR_SERVERS_HEALTHY']"], [u'yarn', "Failed health checks: [u'YARN_JOBHISTORY_HEALTH', u'YARN_NODE_MANAGERS_HEALTHY', u'YARN_RESOURCEMANAGERS_HEALTH']"], [u'ks_indexer', "Failed health checks: [u'KS_INDEXER_HBASE_INDEXERS_HEALTHY']"], [u'hive', "Failed health checks: [u'HIVE_HIVEMETASTORES_HEALTHY', u'HIVE_HIVESERVER2S_HEALTHY']"], [u'oozie', "Failed health checks: [u'OOZIE_OOZIE_SERVERS_HEALTHY']"], [u'hue', "Failed health checks: [u'HUE_HUE_SERVERS_HEALTHY']"], [u'mgmt', "Failed health checks: [u'MGMT_ALERT_PUBLISHER_HEALTH', u'MGMT_EVENT_SERVER_HEALTH', u'MGMT_HOST_MONITOR_HEALTH', u'MGMT_SERVICE_MONITOR_HEALTH']"]]).

+ '[' -n '' ']'

++ printf '\033]0;%s@%s:%s\007' root client2-dev-cdh-docker-launcher-instance '~'

 

 

OK, failed....kill off Docker containers:

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker ps -a

 

CONTAINER ID        IMAGE                                                        COMMAND                  CREATED             STATUS                     PORTS                     NAMES

16319733938c        docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node   "/sbin/init"             22 minutes ago      Up 22 minutes                                        drunk_mestorf

0fcaf9c6ddc3        docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node   "/sbin/init"             22 minutes ago      Up 22 minutes                                        pensive_kirch

efb3495ae56b        docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node   "/sbin/init"             22 minutes ago      Up 22 minutes                                        angry_turing

904e8f4a6290        docker.io/cloudera/clusterdock:cdh580_cm581_primary-node     "/sbin/init"             22 minutes ago      Up 22 minutes              0.0.0.0:32768->7180/tcp   infallible_ritchie

d152d82e497f        docker.io/cloudera/clusterdock:latest                        "python ./bin/start_c"   33 minutes ago      Exited (1) 6 minutes ago                             loving_fermat

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker stop d152d82e497f 904e8f4a6290 efb3495ae56b 0fcaf9c6ddc3 16319733938c

 

Error response from daemon: No such container: d152d82e497f

904e8f4a6290

efb3495ae56b

0fcaf9c6ddc3

16319733938c

 

 

[root@client2-dev-cdh-docker-launcher-instance ~]# docker rm d152d82e497f 904e8f4a6290 efb3495ae56b 0fcaf9c6ddc3 16319733938c

 

904e8f4a6290

efb3495ae56b

0fcaf9c6ddc3

16319733938c

Error response from daemon: No such container: d152d82e497f

 

 

 

 

Cheers,

 

Damion.

 

11 REPLIES 11
Highlighted

Re: Multinode docker based installation error

Rising Star
Hm, that's no fun. It looks like service health was bad, but the actual execution of commands succeeded, so could you try again and look at Cloudera Manager for a clue as to why the services were red? Just tried to repro on my machine and everything succeeded.
Highlighted

Re: Multinode docker based installation error

Contributor

Funny thing was that I pasted the CM URL into a web-browser and could login....all Services had the little blue "stale configuration" icon next to them and I tried deploying stale configuration but nothing came up.

 

The only group of services that seemed OK where the Cloudera Management ones.

 

I terminated the entire cluster and tried again using similar command to spin up a 2-node cluster but again received the same error.

 

I will attempt to create the 4 node cluster again today and update :-)

Highlighted

Re: Multinode docker based installation error

Contributor

Hi Dima (?),

 

As previously mentioned, I have attempted to re-create a new cluster using "clusterdock".

 

This time I thought I'd try a 3 x node cluster in case my initial attempt was resource constrained.

 

I also excluded the SQOOP and SPARK services (again to try and use less resources).

 

Output from my attempt is shown below.

 

This is the same error I previously recieved.

 

If you look at the point 2) down below it is an extract from the CM cloudera-scm-server.log file.

 

Im not sure what I should be looking for in the log ?

 

Are you able to share the command line "clusterdock_run" you are running so I can run it in my environment to see if it works ? 

 

 

1) Attempted cluster creation:

 

[root@client2-dev-docker-launcher-instance ~]# clusterdock_run ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 --secondary-nodes='machine-{2..3}' --exclude-service-types=SQOOP,SPARK

 

+ clusterdock_run ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 '--secondary-nodes=machine-{2..3}' --exclude-service-types=SQOOP,SPARK
+ '[' -z docker.io/cloudera/clusterdock:latest ']'
+ '[' '' '!=' false ']'
+ sudo docker pull docker.io/cloudera/clusterdock:latest
+ '[' -n '' ']'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ sudo docker run --net=host -t --privileged -v /tmp/clusterdock -v /etc/hosts:/etc/hosts -v /etc/localtime:/etc/localtime -v /var/run/docker.sock:/var/run/docker.sock docker.io/cloudera/clusterdock:latest ./bin/start_cluster -n client2-cdh-dev-cluster cdh --primary-node=machine-1 '--secondary-nodes=machine-{2..3}' --exclude-service-types=SQOOP,SPARK
INFO:clusterdock.cluster:Successfully started machine-2.client2-cdh-dev-cluster (IP address: 192.168.123.3).
INFO:clusterdock.cluster:Successfully started machine-3.client2-cdh-dev-cluster (IP address: 192.168.123.4).
INFO:clusterdock.cluster:Successfully started machine-1.client2-cdh-dev-cluster (IP address: 192.168.123.2).
INFO:clusterdock.cluster:Started cluster in 16.75 seconds.
INFO:clusterdock.topologies.cdh.actions:Changing server_host to machine-1.client2-cdh-dev-cluster in /etc/cloudera-scm-agent/config.ini...
INFO:clusterdock.topologies.cdh.actions:Removing files (/var/lib/cloudera-scm-agent/uuid, /dfs*/dn/current/*) from hosts (machine-3.client2-cdh-dev-cluster)...
INFO:clusterdock.topologies.cdh.actions:Restarting CM agents...
Stopping cloudera-scm-agent: [ OK ]
Stopping cloudera-scm-agent: [ OK ]
Starting cloudera-scm-agent: [ OK ]
Starting cloudera-scm-agent: [ OK ]
Stopping cloudera-scm-agent: [ OK ]
Starting cloudera-scm-agent: [ OK ]
INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online...
INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 0.00 seconds.
INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://client2-dev-docker-launcher-instance:32769
INFO:clusterdock.topologies.cdh.cm:Detected CM API v13.
INFO:clusterdock.topologies.cdh.cm_utils:Adding hosts (Ids: 1db77b8a-3c4e-451a-9a28-7caeb22ec6f4) to Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.cm_utils:Creating secondary node host template...
INFO:clusterdock.topologies.cdh.cm_utils:Sleeping for 30 seconds to ensure that parcels are activated...
INFO:clusterdock.topologies.cdh.cm_utils:Applying secondary host template...
INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations...
INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...
INFO:clusterdock.topologies.cdh.actions:Deploying client configuration...
INFO:clusterdock.topologies.cdh.actions:Starting cluster...
INFO:clusterdock.topologies.cdh.actions:Starting Cloudera Management service...
INFO:clusterdock.topologies.cdh.cm:Beginning service health validation...
Traceback (most recent call last):
File "./bin/start_cluster", line 70, in <module>
main()
File "./bin/start_cluster", line 63, in main
actions.start(args)
File "/root/clusterdock/clusterdock/topologies/cdh/actions.py", line 151, in start
deployment.validate_services_started()
File "/root/clusterdock/clusterdock/topologies/cdh/cm.py", line 91, in validate_services_started
"(at fault: {1}).").format(timeout_min, at_fault_services))
Exception: Timed out after waiting 10 minutes for services to start (at fault: [[u'zookeeper', "Failed health checks: [u'ZOOKEEPER_SERVERS_HEALTHY']"], [u'hdfs', "Failed health checks: [u'HDFS_CANARY_HEALTH', u'HDFS_DATA_NODES_HEALTHY', u'HDFS_FREE_SPACE_REMAINING', u'HDFS_HA_NAMENODE_HEALTH']"], [u'hbase', "Failed health checks: [u'HBASE_MASTER_HEALTH', u'HBASE_REGION_SERVERS_HEALTHY']"], [u'solr', "Failed health checks: [u'SOLR_SOLR_SERVERS_HEALTHY']"], [u'yarn', "Failed health checks: [u'YARN_JOBHISTORY_HEALTH', u'YARN_NODE_MANAGERS_HEALTHY', u'YARN_RESOURCEMANAGERS_HEALTH']"], [u'ks_indexer', "Failed health checks: [u'KS_INDEXER_HBASE_INDEXERS_HEALTHY']"], [u'hive', "Failed health checks: [u'HIVE_HIVEMETASTORES_HEALTHY', u'HIVE_HIVESERVER2S_HEALTHY']"], [u'oozie', "Failed health checks: [u'OOZIE_OOZIE_SERVERS_HEALTHY']"], [u'impala', "Failed health checks: [u'IMPALA_CATALOGSERVER_HEALTH', u'IMPALA_IMPALADS_HEALTHY', u'IMPALA_STATESTORE_HEALTH']"], [u'hue', "Failed health checks: [u'HUE_HUE_SERVERS_HEALTHY']"], [u'mgmt', "Failed health checks: [u'MGMT_ALERT_PUBLISHER_HEALTH', u'MGMT_EVENT_SERVER_HEALTH', u'MGMT_HOST_MONITOR_HEALTH', u'MGMT_SERVICE_MONITOR_HEALTH']"]]).

 

 

2) Extract from Cloudera Manager "cloudera-scm-server.log":

 

2016-08-22 08:55:16,396 INFO WebServerImpl:org.springframework.web.servlet.handler.SimpleUrlHandlerMapping: Root mapping to handler of type [class org.springframework.web.servlet.mvc.ParameterizableViewController]
2016-08-22 08:55:16,475 INFO WebServerImpl:org.springframework.web.servlet.DispatcherServlet: FrameworkServlet 'Spring MVC Dispatcher Servlet': initialization completed in 3460 ms
2016-08-22 08:55:16,500 INFO WebServerImpl:com.cloudera.server.web.cmon.JobDetailGatekeeper: ActivityMonitor configured to allow job details for all jobs.
2016-08-22 08:55:18,020 INFO JvmPauseMonitor:com.cloudera.enterprise.debug.JvmPauseMonitor: Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 1185ms: GC pool 'ParNew' had collection(s): count=1 time=32ms, GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=1275ms
2016-08-22 08:55:18,263 INFO WebServerImpl:com.cloudera.server.web.cmf.AggregatorController: AggregateSummaryScheduler started.
2016-08-22 08:55:18,645 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Initializing SearchTemplateManager:2016-08-21T22:55:18.645Z
2016-08-22 08:55:18,670 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Generating entities:2016-08-21T22:55:18.670Z
2016-08-22 08:55:19,077 INFO WebServerImpl:org.mortbay.log: jetty-6.1.26.cloudera.4
2016-08-22 08:55:19,078 INFO WebServerImpl:org.mortbay.log: Started SelectChannelConnector@0.0.0.0:7180
2016-08-22 08:55:19,078 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
2016-08-22 08:55:21,663 INFO ScmActive-0:com.cloudera.server.cmf.components.ScmActive: ScmActive completed successfully.
2016-08-22 08:55:25,758 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Num entities:5267
2016-08-22 08:55:25,758 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Generating documents:2016-08-21T22:55:25.758Z
2016-08-22 08:55:25,918 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Num docs:5037
2016-08-22 08:55:25,919 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Constructing repo:2016-08-21T22:55:25.919Z
2016-08-22 08:55:26,592 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing repo:2016-08-21T22:55:26.592Z
2016-08-22 08:55:30,253 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: Unable to retrieve remote parcel repository manifest
java.util.concurrent.ExecutionException: java.net.ConnectException: https://archive.cloudera.com/cdh5/parcels/5.8/manifest.json
at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:297)
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:399)
at org.jboss.netty.channel.DefaultChannelFuture.addListener(DefaultChannelFuture.java:145)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.doConnect(NettyAsyncHttpProvider.java:1041)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.execute(NettyAsyncHttpProvider.java:858)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:512)
at com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:234)
at com.cloudera.parcel.components.ParcelDownloaderImpl.getRepositoryInfoFuture(ParcelDownloaderImpl.java:534)
at com.cloudera.parcel.components.ParcelDownloaderImpl.getRepositoryInfo(ParcelDownloaderImpl.java:492)
at com.cloudera.parcel.components.ParcelDownloaderImpl.syncRemoteRepos(ParcelDownloaderImpl.java:344)
at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:416)
at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:411)
at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: https://archive.cloudera.com/cdh5/parcels/5.8/manifest.json
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:100)
... 16 more

....

....

....

at org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:204)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:230)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.doConnect(NettyAsyncHttpProvider.java:999)
... 13 more
2016-08-22 08:55:50,901 INFO 1764677695@scm-web-2:com.cloudera.server.web.cmf.AuthenticationSuccessEventListener: Authentication success for user: 'admin' from 192.168.123.1
2016-08-22 08:55:53,867 INFO 1703431718@agentServer-0:com.cloudera.server.cmf.AgentProtocolImpl: Setting default rackId for host 1db77b8a-3c4e-451a-9a28-7caeb22ec6f4: /default
2016-08-22 08:55:53,876 INFO 1703431718@agentServer-0:com.cloudera.server.cmf.AgentProtocolImpl: [DbHost{id=3, hostId=1db77b8a-3c4e-451a-9a28-7caeb22ec6f4, hostName=machine-3.client2-cdh-dev-cluster}] Added to rack group: /default
2016-08-22 08:55:53,889 INFO 1091606227@agentServer-2:com.cloudera.server.cmf.AgentProtocolImpl: [DbHost{id=2, hostId=423c5568-ed84-4bb9-9743-95cfc5e09679, hostName=machine-2.client2-cdh-dev-cluster}] Added to rack group: /default
2016-08-22 08:55:53,910 INFO 1703431718@agentServer-0:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing command ProcessStalenessCheckCommand BasicCmdArgs{args=[First reason why: com.cloudera.cmf.model.DbHost.name (#3) has changed]}.
2016-08-22 08:55:54,035 INFO 1091606227@agentServer-2:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing command ProcessStalenessCheckCommand BasicCmdArgs{args=[First reason why: com.cloudera.cmf.model.DbHost.name (#2) has changed]}.
2016-08-22 08:55:54,076 INFO CommandPusher:com.cloudera.cmf.model.DbCommand: Command 230(ProcessStalenessCheckCommand) has completed. finalstate:CANCELLED, success:false, msg:Aborted command
2016-08-22 08:55:54,080 INFO ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Staleness check not completed: ABORT
2016-08-22 08:55:54,286 INFO ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Staleness check done. Duration: PT0.206S
2016-08-22 08:55:54,868 INFO 1703431718@agentServer-0:com.cloudera.server.cmf.AgentProtocolImpl: [DbHost{id=1, hostId=f64f5e51-45af-4278-88e1-b18ccaea3de6, hostName=machine-1.client2-cdh-dev-cluster}] Added to rack group: /default
2016-08-22 08:55:55,077 INFO 1703431718@agentServer-0:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing command ProcessStalenessCheckCommand BasicCmdArgs{args=[First reason why: com.cloudera.cmf.model.DbHost.name (#1) has changed]}.
2016-08-22 08:55:57,758 INFO 1703431718@agentServer-0:com.cloudera.cmf.command.components.StalenessChecker: No staleness check scheduled, scheduling one in 30 seconds
2016-08-22 08:56:06,730 INFO CMMetricsForwarder-0:com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder: Failed to send metrics.
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy103.writeMetrics(Unknown Source)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.sendWithAvro(ClouderaManagerMetricsForwarder.java:325)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.sendMetrics(ClouderaManagerMetricsForwarder.java:312)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.run(ClouderaManagerMetricsForwarder.java:146)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
... 11 more

....

....

....

2016-08-22 09:01:00,826 INFO ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Staleness check done. Duration: PT0.730S
2016-08-22 09:05:10,476 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaping old, inactive process: DbProcess{id=1, name=cluster-host-inspector, host=machine-2.client2-cdh-dev-cluster}
2016-08-22 09:05:10,487 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaping old, inactive process: DbProcess{id=2, name=cluster-host-inspector, host=machine-1.client2-cdh-dev-cluster}
2016-08-22 09:05:10,504 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2016-08-22 09:05:10,519 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2014-08-22T23:05:10.504Z to reap.
2016-08-22 09:05:10,520 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2016-08-22 09:05:10,521 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2016-08-22 09:05:18,803 WARN 769845859@agentServer-10:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=1 name=null host=423c5568-ed84-4bb9-9743-95cfc5e09679/machine-2.client2-cdh-dev-cluster
2016-08-22 09:05:22,546 WARN 769845859@agentServer-10:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=2 name=null host=f64f5e51-45af-4278-88e1-b18ccaea3de6/machine-1.client2-cdh-dev-cluster
2016-08-22 09:15:10,523 INFO StaleEntityEviction:com.cloudera.cmf.model.HeartbeatStore: Reaped 28 process heartbeats
2016-08-22 09:15:10,527 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2016-08-22 09:15:10,529 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2014-08-22T23:15:10.528Z to reap.
2016-08-22 09:15:10,530 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2016-08-22 09:15:10,530 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2016-08-22 09:25:10,535 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2016-08-22 09:25:10,536 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2014-08-22T23:25:10.535Z to reap.
2016-08-22 09:25:10,537 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2016-08-22 09:25:10,537 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2016-08-22 09:25:22,523 INFO ScmActive-0:com.cloudera.server.cmf.components.ScmActive: (119 skipped) ScmActive completed successfully.
2016-08-22 09:26:11,753 WARN EventStorePublisherWithRetry-0:com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={CATEGORY=[AUDIT_EVENT], SEVERITY=[INFORMATIONAL], SERVICE=[ClouderaManager], SERVICE_TYPE=[ManagerServer], USER=[admin], EVENTCODE=[EV_LOGIN_SUCCESS], MESSAGE_CODES=[LOGIN_SUCCESS]}, content=User admin logged in successfully., timestamp=1471820171000} - 1 of 118 failure(s) in last 1800s
java.io.IOException: Error connecting to node-1.cluster:7184
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:139)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:102)
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:55)
at org.jboss.netty.channel.Channels.connect(Channels.java:642)
at org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:204)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:230)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183)
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:246)
... 10 more
2016-08-22 09:35:10,543 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2016-08-22 09:35:10,544 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2014-08-22T23:35:10.543Z to reap.
2016-08-22 09:35:10,545 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2016-08-22 09:35:10,545 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2016-08-22 09:45:10,550 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2016-08-22 09:45:10,552 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2014-08-22T23:45:10.551Z to reap.
2016-08-22 09:45:10,552 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2016-08-22 09:45:10,552 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.

 

 

Cheers,

 

Damion.

Highlighted

Re: Multinode docker based installation error

Rising Star
Just for fun, just try running with the default values. What happens then?
Only thing I wonder about is if the hyphen in the network name might be
causing issues? Not sure.



--
-Dima
Highlighted

Re: Multinode docker based installation error

Contributor

Hi Dima,

 

I've done some further research and it appears the issue could be related to DNS on the AWS EC2 instance I was running the "clusterdock_run" script on....

 

I tried setting the default CDH cluster using the following on my AWS EC2 instance but this too failed with the same errors:

 

clusterdock_run ./bin/start_cluster cdh 

 

So....

 

I then installed docker-machine version 1.12.0 on my iMac (running intel i7 and 32GB of memory) rather than using an AWS EC2 instance....I setup the default CDH cluster using the following without issues:

 

clusterdock_run ./bin/start_cluster cdh 

 

 

Delete that cluster using:

 

docker stop {container-id1} {container-id2}

docker rm {container-id1} {container-id2}

 

 

Then tried the following "custerdock_run" command on my iMac (to setup 3 x nodes with custom services)....this was _almost_ successful:

 

Damions-iMac:$  clusterdock_run ./bin/start_cluster -n damion-dev-cluster cdh --primary-node=cm-node --secondary-nodes='mwhad-{2..3}' --include-service-types=HDFS,ZOOKEEPER,YARN,OOZIE,SQOOP,HIVE,HUE,SPARK

 

+ clusterdock_run ./bin/start_cluster -n damion-dev-cluster cdh --primary-node=cm-node '--secondary-nodes=mwhad-{2..3}' --include-service-types=HDFS,ZOOKEEPER,YARN,OOZIE,SQOOP,HIVE,HUE,SPARK

+ '[' -z docker.io/cloudera/clusterdock:latest ']'

+ '[' '' '!=' false ']'

+ sudo docker pull docker.io/cloudera/clusterdock:latest

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ '[' -n '' ']'

+ sudo docker run --net=host -t --privileged -v /tmp/clusterdock -v /etc/hosts:/etc/hosts -v /etc/localtime:/etc/localtime -v /var/run/docker.sock:/var/run/docker.sock docker.io/cloudera/clusterdock:latest ./bin/start_cluster -n damion-dev-cluster cdh --primary-node=cm-node '--secondary-nodes=mwhad-{2..3}' --include-service-types=HDFS,ZOOKEEPER,YARN,OOZIE,SQOOP,HIVE,HUE,SPARK

INFO:clusterdock.cluster:Network (damion-dev-cluster) not present, creating it...

INFO:clusterdock.cluster:Successfully setup network (name: damion-dev-cluster).

INFO:clusterdock.cluster:Successfully started mwhad-2.damion-dev-cluster (IP address: 192.168.125.3).

INFO:clusterdock.cluster:Successfully started mwhad-3.damion-dev-cluster (IP address: 192.168.125.4).

INFO:clusterdock.cluster:Successfully started cm-node.damion-dev-cluster (IP address: 192.168.125.2).

INFO:clusterdock.cluster:Started cluster in 6.92 seconds.

INFO:clusterdock.topologies.cdh.actions:Changing server_host to cm-node.damion-dev-cluster in /etc/cloudera-scm-agent/config.ini...

INFO:clusterdock.topologies.cdh.actions:Removing files (/var/lib/cloudera-scm-agent/uuid, /dfs*/dn/current/*) from hosts (mwhad-3.damion-dev-cluster)...

INFO:clusterdock.topologies.cdh.actions:Restarting CM agents...

cloudera-scm-agent is already stopped

Starting cloudera-scm-agent: [  OK  ]

Stopping cloudera-scm-agent: [  OK  ]

Stopping cloudera-scm-agent: [  OK  ]

Starting cloudera-scm-agent: [  OK  ]

Starting cloudera-scm-agent: [  OK  ]

INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online...

INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 42.07 seconds.

INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://moby:32771

INFO:clusterdock.topologies.cdh.cm:Detected CM API v13.

INFO:clusterdock.topologies.cdh.cm_utils:Adding hosts (Ids: bac396d7-e95a-4b2e-9887-21be680dfac7) to Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.cm_utils:Creating secondary node host template...

INFO:clusterdock.topologies.cdh.cm_utils:Sleeping for 30 seconds to ensure that parcels are activated...

INFO:clusterdock.topologies.cdh.cm_utils:Applying secondary host template...

INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations...

INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...

INFO:clusterdock.topologies.cdh.actions:Removing service ks_indexer from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Removing service solr from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Removing service spark_on_yarn from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Removing service impala from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Removing service hbase from Cluster 1 (clusterdock)...

INFO:clusterdock.topologies.cdh.actions:Deploying client configuration...

INFO:clusterdock.topologies.cdh.actions:Starting cluster...

INFO:clusterdock.topologies.cdh.actions:Starting Cloudera Management service...

INFO:clusterdock.topologies.cdh.cm:Beginning service health validation...

INFO:clusterdock.topologies.cdh.cm:Validated that all services started (time: 85.22 s).

INFO:clusterdock.topologies.cdh.actions:We'd love to know what you think of our CDH topology for clusterdock! Please direct any feedback to our community forum at http://tiny.cloudera.com/hadoop-101-forum.

INFO:start_cluster:CDH cluster started in 06 min, 27 sec.

+ '[' -n '' ']'

 

 

I say ALMOST successful because:

 

1) The -n name switch didnt seem to work (the cluster name in CM was "Cluster 1 - clusterdock")

 

2) The SPARK and SQOOP services didnt get installed ?!??!?   (perhaps SQOOP was meant to be SQOOP2 ?)

 

3) The Cloduera Management Service "Activity Monitor" was also not installed by default.

 

 

I then had to manually add the SPARK and SQOOP2 services from within CM, as well as the CM "Activity Monitor" service.

 

I also manually added 2 x additional ZooKeeper services (to get the Quorum setup of 3).

 

 

Cheers,

 

Damion.

Highlighted

Re: Multinode docker based installation error

Rising Star
Sounds like all is well in clusterdock world, in that case :). To address
the remaining issues you saw:

- The -n option is what to name the Docker network (in the Docker 1.10+
sense), not the Cloudera Manager cluster (that's why it's an argument of
the `start_cluster` script and not the `cdh` topology for clusterdock).

- The type for Spark should be SPARK_ON_YARN. All we do, as you can see in
the source code up on GitHub, is remove service types present in our
original images if the `--include-service-types` argument is passed.

- Not sure about the AMON, so perhaps it's just not present in the default
CM deployment? Same with ZK servers; we don't make any assumptions about
how people want to deploy roles beyond simply treating the CDH cluster as
if comprised of 1 primary node and n-1 secondary nodes.


--
-Dima

Re: Multinode docker based installation error

Contributor

Thanks for confirming my shortcomings with knowing the switches for clusterdock !

 

Much appreciated.

 

I don't suppose there's a way I can force a service to be created on a node other than the CM (primary) one ?

 

ie:  the primary-node ends up with every service under the sun, what if I want to adhere to "Best Practices" and force NN onto say secondary-node-1 and Secondary NN onto secondary-node2 ?

 

Am I asking too much ;-)

 

....I guess I can simply go and follow the CDH 5.8.1 Role Migration Wizard and move NN....

 

 

 

Cheers,

 

Damion. 

 

Highlighted

Re: Multinode docker based installation error

Rising Star
:) Would need the wizard. Sorry! But glad things aren't failing out on you
anymore.



--
-Dima
Highlighted

Re: Multinode docker based installation error

Contributor

Hi Dima,

 

I was wondering if I could ask a question about clusterdock ( yes, another one I hear you say ;-) ).

 

I've had no issues spinning up a test cluster for myself (to test various High Availability, NN/SNN scenario's and migrating services around) using the following "clusterdock" command:

 

clusterdock_run ./bin/start_cluster -n my-test-cluster cdh --primary-node=cm-node --secondary-nodes='mwhad-{1..10}’ --include-service-types=HDFS,ZOOKEEPER,YARN,HIVE,HUE,OOZIE,SPARK_ON_YARN,IMPALA

 

 

However, I have noticed the docker image that each node is based on sets up the HDFS file system under the / file system which is only 10GB in size initially (see "df -hTl" output below).

 

This means the first thing I'm forced to do, is change a whole bunch of CDH monitoring parameters within the cluster to essentially prevent various CM beacon alerts from firing when the max and min metric "threshold" values are reached.

 

There are 20 x such parameters needing to be modified in order to prevent beacons from alerting in CM based on the fact the / file system, as per the "df -hTl" output below only has 4.1GB available

 

These include:

 

HDFS Free Space Monitoring Thresholds

HDFS Checkpoint Directories Free Space Monitoring Absolute Thresholds

DataNode Free Space Monitoring Thresholds

Log Directory Free Space Monitoring Absolute Thresholds

Heap Dump Directory Free Space Monitoring Absolute Thresholds

Temporary Dump Directory Free Space Monitoring Absolute Thresholds

 

 

[root@cm-node dfs]#  df -hTl


Filesystem Type Size Used Avail Use% Mounted on
rootfs rootfs 10G 6.0G 4.1G 60% /
tmpfs tmpfs 32G 0 32G 0% /dev
tmpfs tmpfs 32G 0 32G 0% /sys/fs/cgroup
shm tmpfs 32G 0 32G 0% /dev/shm
tmpfs tmpfs 32G 0 32G 0% /proc/kcore
tmpfs tmpfs 32G 0 32G 0% /proc/timer_list
tmpfs tmpfs 32G 0 32G 0% /proc/timer_stats
tmpfs tmpfs 32G 0 32G 0% /proc/sched_debug
tmpfs tmpfs 32G 0 32G 0% /dev/shm
cm_processes tmpfs 32G 16M 32G 1% /var/run/cloudera-scm-agent/process

 

 

Could it be possible to increase the / volume in the Cloudera source docker image to say 50GB ?

 

This would prevent CDH Admins (like me) from having to change all the CM threshold parameters. 

 

You can probably guess I am a novice with docker....

 

I tried to be smart and install "flocker" on my AWS EC2 CentOS 7.2 host where I'm running "clusterdock", and tried to create and mount a shared volumes outside of the docker UFS, to all 10 x hadoop worker nodes, but alas I am unable to due to my lack of docker skills.

 

Any assistance you could offer would be greatly appreciated.

 

 

Thanks,

 

Damion.

Don't have an account?
Coming from Hortonworks? Activate your account here