Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Docker quickstart images issues on windows

avatar
Explorer

Hi 

 

I installed docker container and followed above instrucitons on a Windows laptop. Ran the two steps as mentioned (1. /usr/bin/docker-quickstart which lets me connect to Bash shell and 2. Clouder manager. ). When I do a service --status-all I see following message at end of 2 steps. 

cloudera-scm-agent (pid 2250) is running...

cloudera-scm-server dead but pid file exists..

Many other Hadoop services are not running .

 

Has any one tried this setup on a windows laptop? I do not understand why it has to be done in 2 steps when you could run everything in background with -d option. Especially step 2 of cloudera-manaher shutting down CDH services followed by CM_services start.  I did notice difference in CM_services list between docker-quickstart.sh and cloudera-manager.sh scirpt

 

Thanks

Sudhir

1 ACCEPTED SOLUTION

avatar
Guru

1) I don't know what the issue is with connecting to MySQL. Definitely ensure that MySQL is running (sudo service mysqld status; sudo service mysqld restart). The configuration is in /etc/cloudera-scm-server/db.properties.

 

2) We do not have a recommended solution for running Hadoop on Docker. Director currently supports cloud platforms like AWS but not Docker. Docker is somewhat counter-productive for production Hadoop clusters because even though Hadoop is designed to get a bunch of machines to work together, having the cluster split into fewer, bigger pieces is better. Docker essentially partiations a machine into smaller pieces. It can be handy for testing, etc. when performance doesn't matter, but it requires a lot of networking setup to make DNS and IP addresses, etc. work the way Cloudera Manager and Hadoop assume they do.

View solution in original post

9 REPLIES 9

avatar
Guru

The difference is that Cloudera Manager runs Hadoop services independent of Linux's service management (because it manages them with a cluster-wide context rather than the host-only context that the Linux service management has). So once you start Cloudera Manager, you will see that all of the Hadoop services are stopped according to Linux: they're being managed by Cloudera Manager, and not Linux anymore. The reason it's done in 2 steps is that most users of the VM do not have a need or sufficient memory on their laptops to run the entire stack in a single node including Cloudera Manager, so by default the image runs a CDH-only deployment with the services managed by Linux. /home/cloudera/cloudera-manager will disable all such services and enable CM for users that can do that and want to.

avatar
Explorer

Thanks Sean..

 

So technically I could run (I have enough memory) and I did run cloudera-manager directly once I logged onto bash. I see that many of services are disabled post cloudera-manager.sh exucution which is expected as per explanation. 

 

How do I cross validate all cluster-wide context services are running? while I got a successful message "please logon to quickstart.cloudera:7180" after script , I could not ping "using curl localhost:7180" . 

 

Sudhir

avatar
Guru
Port forwarding in Docker can be tricky, see the "Networking" section here:
https://hub.docker.com/r/cloudera/quickstart/. You need to instruct Docker
to forward any ports you want to use when you start the container (e.g.
8888 for Hue, 7180 for Cloudera Manager), and then you have to lookup what
port number on your host maps to that port number on the guest. So if you
instruct Docker to launch your container with '-p 7180', from the guests
perspective it's listening on that port. However, on your host machine, it
will be assigned a different port (that way, many containers can run the
same services without their ports conflicting). You would need to run
'docker port 7180' and it would show you the interface it
was bound to (usually 0.0.0.0, meaning it's listening on all interfaces /
IP addresses) and the port, which might be 31000 or something in that
neighborhood. In which case, 31000 is actually the port you need to connect
to.

avatar
Explorer

Thanks for prompt reply. I am more interested in finding out post "cloudera-manager" script all services are running. How do I validate if services required for CM are running ? Since I am on the container itself I can calidate locally w/o worry about port mappings on 7180. I am trying curl localhost:7180 and it says it is not responding 

 

Sudhir

avatar
Guru
So once you've started Cloudera Manager it's only running management
services, not CDH (since the full stack uses so much more memory than most
users have on their laptops - it's better to have you start what you know
you'll need). So once you CAN connect to CM, you will need to start the
services you want via the web UI or the CM API (there is a command to start
every service on the cluster, too).

Now as for why you can't connect, after CM starts it does take a couple of
minutes to open the port because it does a lot of checks before it will
accept any user input. However if 'service cloudera-scm-server status' has
said it's up for several minutes, the next thing I'd check is which
interface it's bound to. I'd expect it to bind to everything (including
localhost), but also try 'quickstart.cloudera' (since you're in the
container, that should resolve, and it may actually be a different IP
address than localhost/127.0.0.1 depending on how the network interfaces
are presented). You can also run 'sudo lsof -i | grep 7180' and it should
show you details of whatever's listening on that port. Failing all that,
I'd also check the logs in /var/log/cloudera-scm-server and see if
anything's gone wrong that you can see there.

avatar
Explorer

My intent to start only desired services on "CDH" manually from CM. service cloudera-scm-server status ias follows:

cloudera-scm-agent (pid 2250) is running...

cloudera-scm-server dead but pid file exists..

 

A peek into /var/log/cloudera-scm-server refers to failed connectivity toDB server.  Last acquisition attempt exception:com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

service mysqld status -->  indicates it is running w/o issues. What config CM uses to connect to DB server? 

 

Sudhir

avatar
Explorer

Do you have suggestions on next steps for posted error message for "scm-coudera-server"

 

Also what is cloudera position on a multi-node cluster setup using docker containers ? I was recommended "cloudera director" - Is there a docker container available in that context?

avatar
Explorer

1) what is the recommendation to resolve scm-server issue?

cloudera-scm-agent (pid 2250) is running...

cloudera-scm-server dead but pid file exists..

 A peek into /var/log/cloudera-scm-server refers to failed connectivity toDB server.  Last acquisition attempt exception:com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

 

2) What is the recommended approach for a multi-node cluster. I want to able to deploy datanodes on 2 containers and namenode on a 3'rd container. Do you have a tried solution?

 

Sudhir

avatar
Guru

1) I don't know what the issue is with connecting to MySQL. Definitely ensure that MySQL is running (sudo service mysqld status; sudo service mysqld restart). The configuration is in /etc/cloudera-scm-server/db.properties.

 

2) We do not have a recommended solution for running Hadoop on Docker. Director currently supports cloud platforms like AWS but not Docker. Docker is somewhat counter-productive for production Hadoop clusters because even though Hadoop is designed to get a bunch of machines to work together, having the cluster split into fewer, bigger pieces is better. Docker essentially partiations a machine into smaller pieces. It can be handy for testing, etc. when performance doesn't matter, but it requires a lot of networking setup to make DNS and IP addresses, etc. work the way Cloudera Manager and Hadoop assume they do.