Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Failed deploy cluster cdh 6.1.0 onto centos7 containers. Reporting 'Failed to receive heart beat'

avatar
New Contributor

I'm currently building a customed cdh6.1.0 docker image.

Based on centos7:latest.

Installed net-tools, openssh-server, openssh-clients and etc.

The privileged is set to true to allow systemd boot up.

All is fine(though cost me a week to get through) and now I've got images:

 

1. image installed cloudera-scm-server (used as cloudera manager)

2. image installed dnsmasqude to provide dns resolution instead of editing the /etc/hosts (used as dns)

3. image loaded with latest cdh6.1.0/cm6.1.0/gplextra6.1.0 repos serving on port 8900 (used as local repo)

4. image installed only sshd and ntpd services (used for deployment, run under privileged mode)

 

Then I started a test cluster to deploy CDH on some replicates of the 4th images. Though there were a lot of errors bursted into logs (such as some file cannot be found which in fact exists and well written).

 

After a short duration, the installation completes, the detail should the installation script completed. However the status showed that "Installation failed. Failed to receive heartbeat from agent.". I'm quite puzzled at the information.

 

After quite a waste of time searching informations online, I thought my situation is quite different from others.

1. installation is completed, agent is started every where.

2. the time would not be any different, they are on the docker from the same machine. And I checked them one by one.

3. All file mentioned not found are found except for /run/user/0/.

4. The hearbeat itself... I tried a lot of time finally got to the HOST page and saw the latest heart beat were with in minute, there is no reason that were caused by delay. The status of each unit is unknown, including the local server(agent pre-installed).

5. On the agent side I inspected every port might link to failure like 9000/9001 and 7182. 7182 is in connected state. However, 9000/9001 is not connected. They were all established by agents.

6. The config.ini on client side did not set tls, and I also checked the security page on server side, the tls option is not set as well. 

 

The only suspecious point is the server is not intended to connect the agent:9000/9001?

There is no valuable information provided in logs, I'll attach the long-long log of client and server side afterwards

I also run an inspection into the hosts, It showed that something is not installed onto each client.

For example:

1. mysql-connector-java when I'm using mysql 5.7

2. the path for oracle jdk , though oracle jdk is automatically installed, the path is not set to profile

 

Anyone providing help is appreciated!

 

 

 

1 ACCEPTED SOLUTION

avatar
New Contributor
A clean solution should be indicating the networks propery 'name' in the docker-compose files, which would not generate a system generated domain name. While this problem is solved, another one popped up: Installation completed, then proceed to download 6.1.1 parcels. However error reported 'Src file /opt/cloudera/parcels/.flood/CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel/CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel does not exist' I found something in the log: ' stderr: [tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/lib: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows/spark-scala/lib: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows/spark-scala: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie: Directory renamed before its status could be extracted ' though the details is unknown, and I found the complete file on cm node. there is no reason it cannot be extracted can anyone help?

View solution in original post

3 REPLIES 3

avatar
New Contributor
The main problem seems lies in the dns name setting does not match the reverse-resolved host name. For example: 1. You set aaaa as the host name in a docker compose file for the host name field with a specified ip iiii. 2. You ran the host on a vpn named as bbbb 3. You have a dns mapping aaaa -> iiii, however, in the reverse side iiii resolved to bbbb and bbbb cannot be resolved anyway! So how to fix this, any idea? I can easily paste this...though, I want a clean solution.

avatar
New Contributor
A clean solution should be indicating the networks propery 'name' in the docker-compose files, which would not generate a system generated domain name. While this problem is solved, another one popped up: Installation completed, then proceed to download 6.1.1 parcels. However error reported 'Src file /opt/cloudera/parcels/.flood/CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel/CDH-6.1.1-1.cdh6.1.1.p0.875250-el7.parcel does not exist' I found something in the log: ' stderr: [tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/lib: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows/spark-scala/lib: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows/spark-scala: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples/workflows: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie/examples: Directory renamed before its status could be extracted tar: CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/hue/apps/oozie: Directory renamed before its status could be extracted ' though the details is unknown, and I found the complete file on cm node. there is no reason it cannot be extracted can anyone help?

avatar
New Contributor