I had cluster configured using Wizard. But all components are unhealthy. The same error across al components I see, when I try to start them up:
... resource_management.core.exceptions.Fail: Execution of 'conf-select set-conf-dir --package hadoop --stack-version 22.214.171.124 --conf-version 0' returned 1. 126.96.36.199 Incorrect stack version ...
It is stated clear, that stack version is incorrect, but why? I have ambari of v.2.2.2 and HDP of v2.4. They should work fine together according to compatibility matrix.
Please shed some light on why is the error and what direction to move.
Can you please share the output of the following command, From the host where you see this failure message. (it should match with other host outputs)
# hdp-select versions # hdp-select
Also can you please share the output of the following SQL queries that you need to run on the Ambari Database. (It might be possible that your 'repo_version' table might have some incorrect entries compared the the CURRENT version mentioned in the 'cluster_version' table)
SELECT * FROM repo_version; SELECT * FROM cluster_version;
- Also check if the "/etc/yum.repos.d/HDP.repo" file has the correct entry.
Hi @Jay SenSharma,
Thank you for your reply.
Output for the first two commands is the same across all node (in terms of components versions)
[TEST] root@hdpc-t01:~ # hdp-select versions 188.8.131.52-169 [TEST] root@hdpc-t01:~ # hdp-select accumulo-client - None accumulo-gc - None accumulo-master - None accumulo-monitor - None accumulo-tablet - None accumulo-tracer - None atlas-server - None falcon-client - None falcon-server - None flume-server - None hadoop-client - 184.108.40.206-169 hadoop-hdfs-datanode - 220.127.116.11-169 hadoop-hdfs-journalnode - 18.104.22.168-169 hadoop-hdfs-namenode - 22.214.171.124-169 hadoop-hdfs-nfs3 - 126.96.36.199-169 hadoop-hdfs-portmap - 188.8.131.52-169 hadoop-hdfs-secondarynamenode - 184.108.40.206-169 hadoop-httpfs - None hadoop-mapreduce-historyserver - 220.127.116.11-169 hadoop-yarn-nodemanager - 18.104.22.168-169 hadoop-yarn-resourcemanager - 22.214.171.124-169 hadoop-yarn-timelineserver - 126.96.36.199-169 hbase-client - None hbase-master - None hbase-regionserver - None hive-metastore - None hive-server2 - None hive-webhcat - None kafka-broker - None knox-server - None mahout-client - None oozie-client - None oozie-server - None phoenix-client - None phoenix-server - None ranger-admin - None ranger-kms - None ranger-usersync - None slider-client - None spark-client - 188.8.131.52-169 spark-historyserver - 184.108.40.206-169 spark-thriftserver - 220.127.116.11-169 sqoop-client - None sqoop-server - None storm-client - None storm-nimbus - None storm-slider-client - None storm-supervisor - None zeppelin-server - None zookeeper-client - 18.104.22.168-169 zookeeper-server - 22.214.171.124-169
An output for select * from repo_version is attached (it was a bit bulky, so I attached as a separate file)
A "cluster_version" looks like this:
ambari=> SELECT * FROM cluster_version; id | repo_version_id | cluster_id | state | start_time | end_time | user_name ----+-----------------+------------+---------+---------------+---------------+------------ 1 | 1 | 2 | CURRENT | 1496204467640 | 1496204467652 | _anonymous (1 row)
The "/etc/yum.repos.d/HDP.repo" contains this:
[HDP-2.4] name=HDP-2.4 baseurl=http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/126.96.36.199 path=/ enabled=0 gpgcheck=0
Thanks for your help again and let me know if outputs are looking healthy or not.
The output shows dependency in the records, which we might need to fix manually.
- The cluster_version entry shows that the cluster is using repo_version_id =1 where as from your attached "select-from-repo-version-201706061643.xml" file we see that the "repo_version_id" is pointing to OLD HDP stack (188.8.131.52) .
- The "hdp-select" output shows that the Host Components are properly upgraded to "184.108.40.206-169".
- So i guess you should try the following
1. Stop Ambari Server
# ambari-server stop
2. Collect a Latest DB dump for safety and backup. (because we are going to modify the database manually)
3. Now update the cluster_version table as following:
UPDATE cluster_version SET repo_version_id = 2 WHERE state = 'CURRENT';
4. Restart ambari server
# ambari-server start
Also please share the output of "host_version" table.
Hi @Jay SenSharma,
Thank you for your helpful comment. It actually helped. I'am getting a different errors now, but a have a feeling they may relate to a storage.
Btw, here is the output from host_version:
ambari=# select * from ambari.host_version; id | repo_version_id | host_id | state ----+-----------------+---------+--------- 1 | 1 | 5 | CURRENT 2 | 2 | 51 | CURRENT 3 | 2 | 3 | CURRENT 4 | 2 | 1 | CURRENT 5 | 2 | 2 | CURRENT 6 | 2 | 4 | CURRENT
The host id=5 is our shared storage on EMC Isilon. All the other host IDs are nodes of our cluster.
Do you think they should all have the same repo_version_id?
I'm actually wondering why they don't. It is the first installation and not an upgrade. (you mentioned "upgrade" in one of your previous comments).
Ideally the cluster Nodes (hosts) should be on the same repo_version. So either the mentioned host (host_id=5) is either not upgraded properly or the table is not updated correctly.
Please login to that problematic host and then try running the following command to see if it lists the upgraded package or not?