Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cluster configuration completed, but components unhealthy

Cluster configuration completed, but components unhealthy

New Contributor

Hi,

I had cluster configured using Wizard. But all components are unhealthy. The same error across al components I see, when I try to start them up:

...
resource_management.core.exceptions.Fail:
Execution of 'conf-select set-conf-dir --package hadoop --stack-version 2.4.0.0
--conf-version 0' returned 1. 2.4.0.0 Incorrect stack version
...

It is stated clear, that stack version is incorrect, but why? I have ambari of v.2.2.2 and HDP of v2.4. They should work fine together according to compatibility matrix.

Please shed some light on why is the error and what direction to move.

5 REPLIES 5

Re: Cluster configuration completed, but components unhealthy

Super Mentor
@Igor Gorbatovsky

Can you please share the output of the following command, From the host where you see this failure message. (it should match with other host outputs)

# hdp-select versions 
# hdp-select

Also can you please share the output of the following SQL queries that you need to run on the Ambari Database. (It might be possible that your 'repo_version' table might have some incorrect entries compared the the CURRENT version mentioned in the 'cluster_version' table)

SELECT * FROM repo_version;
SELECT * FROM cluster_version;

- Also check if the "/etc/yum.repos.d/HDP.repo" file has the correct entry.

.

Re: Cluster configuration completed, but components unhealthy

New Contributor

Hi @Jay SenSharma,

Thank you for your reply.

Output for the first two commands is the same across all node (in terms of components versions)

[TEST] root@hdpc-t01:~
# hdp-select versions
2.4.0.0-169
[TEST] root@hdpc-t01:~
# hdp-select
accumulo-client - None
accumulo-gc - None
accumulo-master - None
accumulo-monitor - None
accumulo-tablet - None
accumulo-tracer - None
atlas-server - None
falcon-client - None
falcon-server - None
flume-server - None
hadoop-client - 2.4.0.0-169
hadoop-hdfs-datanode - 2.4.0.0-169
hadoop-hdfs-journalnode - 2.4.0.0-169
hadoop-hdfs-namenode - 2.4.0.0-169
hadoop-hdfs-nfs3 - 2.4.0.0-169
hadoop-hdfs-portmap - 2.4.0.0-169
hadoop-hdfs-secondarynamenode - 2.4.0.0-169
hadoop-httpfs - None
hadoop-mapreduce-historyserver - 2.4.0.0-169
hadoop-yarn-nodemanager - 2.4.0.0-169
hadoop-yarn-resourcemanager - 2.4.0.0-169
hadoop-yarn-timelineserver - 2.4.0.0-169
hbase-client - None
hbase-master - None
hbase-regionserver - None
hive-metastore - None
hive-server2 - None
hive-webhcat - None
kafka-broker - None
knox-server - None
mahout-client - None
oozie-client - None
oozie-server - None
phoenix-client - None
phoenix-server - None
ranger-admin - None
ranger-kms - None
ranger-usersync - None
slider-client - None
spark-client - 2.4.0.0-169
spark-historyserver - 2.4.0.0-169
spark-thriftserver - 2.4.0.0-169
sqoop-client - None
sqoop-server - None
storm-client - None
storm-nimbus - None
storm-slider-client - None
storm-supervisor - None
zeppelin-server - None
zookeeper-client - 2.4.0.0-169
zookeeper-server - 2.4.0.0-169

An output for select * from repo_version is attached (it was a bit bulky, so I attached as a separate file)

select-from-repo-version-201706061643.xml

A "cluster_version" looks like this:

ambari=> SELECT * FROM cluster_version;
 id | repo_version_id | cluster_id |  state  |  start_time   |   end_time    | user_name
----+-----------------+------------+---------+---------------+---------------+------------
  1 |               1 |          2 | CURRENT | 1496204467640 | 1496204467652 | _anonymous
(1 row)

The "/etc/yum.repos.d/HDP.repo" contains this:

[HDP-2.4]
name=HDP-2.4
baseurl=http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.4.0.0

path=/
enabled=0
gpgcheck=0

Thanks for your help again and let me know if outputs are looking healthy or not.

Thanks,

Igor

Re: Cluster configuration completed, but components unhealthy

Super Mentor

@Igor Gorbatovsky

The output shows dependency in the records, which we might need to fix manually.

- The cluster_version entry shows that the cluster is using repo_version_id =1 where as from your attached "select-from-repo-version-201706061643.xml" file we see that the "repo_version_id" is pointing to OLD HDP stack (2.4.0.0) .

- The "hdp-select" output shows that the Host Components are properly upgraded to "2.4.0.0-169".

- So i guess you should try the following

1. Stop Ambari Server

# ambari-server stop

2. Collect a Latest DB dump for safety and backup. (because we are going to modify the database manually)

3. Now update the cluster_version table as following:

UPDATE cluster_version SET repo_version_id = 2 WHERE state = 'CURRENT';

4. Restart ambari server

# ambari-server start

.

Also please share the output of "host_version" table.

Re: Cluster configuration completed, but components unhealthy

New Contributor

Hi @Jay SenSharma,

Thank you for your helpful comment. It actually helped. I'am getting a different errors now, but a have a feeling they may relate to a storage.

Btw, here is the output from host_version:

ambari=# select * from ambari.host_version;
 id | repo_version_id | host_id |  state
----+-----------------+---------+---------
  1 |               1 |       5 | CURRENT
  2 |               2 |      51 | CURRENT
  3 |               2 |       3 | CURRENT
  4 |               2 |       1 | CURRENT
  5 |               2 |       2 | CURRENT
  6 |               2 |       4 | CURRENT

The host id=5 is our shared storage on EMC Isilon. All the other host IDs are nodes of our cluster.

Do you think they should all have the same repo_version_id?

I'm actually wondering why they don't. It is the first installation and not an upgrade. (you mentioned "upgrade" in one of your previous comments).

Thanks,

Igor

Re: Cluster configuration completed, but components unhealthy

Super Mentor

@Igor Gorbatovsky

Ideally the cluster Nodes (hosts) should be on the same repo_version. So either the mentioned host (host_id=5) is either not upgraded properly or the table is not updated correctly.

Please login to that problematic host and then try running the following command to see if it lists the upgraded package or not?

# hdp-select

.