I'm currently building a customed cdh6.1.0 docker image.
Based on centos7:latest.
Installed net-tools, openssh-server, openssh-clients and etc.
The privileged is set to true to allow systemd boot up.
All is fine(though cost me a week to get through) and now I've got images:
1. image installed cloudera-scm-server (used as cloudera manager)
2. image installed dnsmasqude to provide dns resolution instead of editing the /etc/hosts (used as dns)
3. image loaded with latest cdh6.1.0/cm6.1.0/gplextra6.1.0 repos serving on port 8900 (used as local repo)
4. image installed only sshd and ntpd services (used for deployment, run under privileged mode)
Then I started a test cluster to deploy CDH on some replicates of the 4th images. Though there were a lot of errors bursted into logs (such as some file cannot be found which in fact exists and well written).
After a short duration, the installation completes, the detail should the installation script completed. However the status showed that "Installation failed. Failed to receive heartbeat from agent.". I'm quite puzzled at the information.
After quite a waste of time searching informations online, I thought my situation is quite different from others.
1. installation is completed, agent is started every where.
2. the time would not be any different, they are on the docker from the same machine. And I checked them one by one.
3. All file mentioned not found are found except for /run/user/0/.
4. The hearbeat itself... I tried a lot of time finally got to the HOST page and saw the latest heart beat were with in minute, there is no reason that were caused by delay. The status of each unit is unknown, including the local server(agent pre-installed).
5. On the agent side I inspected every port might link to failure like 9000/9001 and 7182. 7182 is in connected state. However, 9000/9001 is not connected. They were all established by agents.
6. The config.ini on client side did not set tls, and I also checked the security page on server side, the tls option is not set as well.
The only suspecious point is the server is not intended to connect the agent:9000/9001?
There is no valuable information provided in logs, I'll attach the long-long log of client and server side afterwards
I also run an inspection into the hosts, It showed that something is not installed onto each client.
1. mysql-connector-java when I'm using mysql 5.7
2. the path for oracle jdk , though oracle jdk is automatically installed, the path is not set to profile
Anyone providing help is appreciated!