Support Questions

Find answers, ask questions, and share your expertise

Heartbeat works, but when I start services I get "'Host_Level_Params for cluster_id=2 is missing. Check if server sent it.'"

New Contributor

So I have had this issue for a quite long time so far, and it's been really challenging. 


So, on one of the servers (secondary namenode) of a 12 node development cluster, the heartbeat is being sent and received. The IDs of each heartbeat request are incremental, which I believe means it's working fine from ambari-agent aspect.


I have SNameNode, HiveServer and other services on that server, and whenever I try to start any of them I get this directly from Ambari UI


Caught an exception while executing custom service command: <type 'exceptions.KeyError'>: 'Host_Level_Params for cluster_id=2 is missing. Check if server sent it.'; 'Host_Level_Params for cluster_id=2 is missing. Check if server sent it.'


I'm not able to start no slave or master services on that server AT ALL. 


I don't see anything in the logs to be honest that could point me to a direction. 


Few information of the environment:

1- Redhat 7.7 servers

2- Latest JDK 8 231

3- HDP so Ambari


Your help is appreciated. 


Master Mentor



Your issue is being generated by the python script


see line  38 "host_level_params_cache = self.host_level_params_cache[cluster_id"


Solution 1 on node 6 

Delete the tmp files to empty the cache on node 6 after stopping the ambari-agent

node6 # ambari-agent stop

node6 # rm -rf /var/lib/ambari-agent/*

Then restart the ambari-agent on host 6

node6 # ambari-agent start

Solution 2  on node 6 


node6 # ambari-agent stop

yum erase ambari-agent

rm -rf /var/lib/ambari-agent
rm -rf /var/run/ambari-agent
rm -rf /usr/lib/amrbari-agent
rm -rf /etc/ambari-agent
rm -rf /var/log/ambari-agent
rm -rf /usr/lib/python2.6/site-packages/ambari*


Re-install of Ambari Agent

yum install ambari-agent

# Change hostname to point to the Ambari Server
vi /etc/ambari-agent/conf/ambari-agent.ini

Start the ambari-agent agent

# ambari-agent start


Please revert


View solution in original post


Master Mentor


I have an idea, depending on your backend Ambari database please first do a backup. We are not going to do any changes yet but validate my suspicion

DB backup
Assuming you are on MySQL /MariaDB 
mysqldump –u[user name] –p[password] [database name] > [dump file]


Check cluster state

select * from clusterstate;


The value found above should be there in Stage table's "cluster_id" columns

select stage_id, request_id, cluster_id from stage;

Identify troublesome host

select host_id,host_name from hosts;


Assuming you got host id 3 for the troublesome host


select cluster_id,component_name from hostcomponentdesiredstate where host_id=3;

select cluster_id,component_name from hostcomponentstate where host_id=3;

select cluster_id,service_name from hostconfigmapping where host_id=3;


Share your output for all the above steps, please tokenize your hostname.domain

New Contributor

Thanks Shelton for answering, 


It's a postgres database, I did the backup with pg_dump


As for the queries:


ambari=> select * from clusterstate;
 cluster_id | current_cluster_state | current_stack_id
          2 |                       |              101
(1 row)



As for this

select stage_id, request_id, cluster_id from stage;

Most of the entries have cluster_id 2, but I see entries with cluster_id -1


As for the host_id of the troublesome host is 6

(I just notice there are 13 hosts, while I have only 12. The 13th is basically host 6 but with a different extension, like if host 6 is the 13th is


As for the remaining 


ambari=> select cluster_id,component_name from hostcomponentdesiredstate where host_id=6;
 cluster_id |     component_name
          2 | ZOOKEEPER_SERVER
          2 | INFRA_SOLR
          2 | LIVY2_SERVER
          2 | HDFS_CLIENT
          2 | HIVE_METASTORE
          2 | HIVE_CLIENT
          2 | HIVE_SERVER
          2 | METRICS_MONITOR
          2 | PIG
          2 | ZOOKEEPER_CLIENT
          2 | INFRA_SOLR_CLIENT
          2 | SQOOP
          2 | RANGER_ADMIN
          2 | RANGER_KMS_SERVER
          2 | SPARK2_CLIENT
          2 | RANGER_USERSYNC
          2 | HBASE_CLIENT
          2 | YARN_CLIENT
          2 | TEZ_CLIENT
          2 | MAPREDUCE2_CLIENT
(24 rows)



And also



ambari=> select cluster_id,component_name from hostcomponentstate where host_id=6;
 cluster_id |     component_name
          2 | YARN_CLIENT
          2 | INFRA_SOLR
          2 | HIVE_SERVER
          2 | HIVE_METASTORE
          2 | RANGER_USERSYNC
          2 | HBASE_CLIENT
          2 | ZOOKEEPER_SERVER
          2 | RANGER_ADMIN
          2 | HDFS_CLIENT
          2 | LIVY2_SERVER
          2 | RANGER_KMS_SERVER
          2 | MAPREDUCE2_CLIENT
          2 | TEZ_CLIENT
          2 | PIG
          2 | HIVE_CLIENT
          2 | ZOOKEEPER_CLIENT
          2 | METRICS_MONITOR
          2 | INFRA_SOLR_CLIENT
          2 | SPARK2_CLIENT
          2 | SQOOP






ambari=> select cluster_id,service_name from hostconfigmapping where host_id=6;
 cluster_id | service_name
(0 rows)



For the above, I checked all my host_ids none of them have any rows. 


Many thanks for your help!

Master Mentor



Your issue is being generated by the python script


see line  38 "host_level_params_cache = self.host_level_params_cache[cluster_id"


Solution 1 on node 6 

Delete the tmp files to empty the cache on node 6 after stopping the ambari-agent

node6 # ambari-agent stop

node6 # rm -rf /var/lib/ambari-agent/*

Then restart the ambari-agent on host 6

node6 # ambari-agent start

Solution 2  on node 6 


node6 # ambari-agent stop

yum erase ambari-agent

rm -rf /var/lib/ambari-agent
rm -rf /var/run/ambari-agent
rm -rf /usr/lib/amrbari-agent
rm -rf /etc/ambari-agent
rm -rf /var/log/ambari-agent
rm -rf /usr/lib/python2.6/site-packages/ambari*


Re-install of Ambari Agent

yum install ambari-agent

# Change hostname to point to the Ambari Server
vi /etc/ambari-agent/conf/ambari-agent.ini

Start the ambari-agent agent

# ambari-agent start


Please revert


New Contributor

I'm sorry for not answering earlier, only now got access to the environment again. 


So I did solution one, and the agent worked, but it went down in a while which I couldn't realize why, the logs didn't say anything. 

But when it was up, I tried to start services there through Ambari, and it looks like it doesn't initiate a request, since there are no logs about the start action of that service. 


Then, I thought let's clean everything and start from fresh, deleted ambari-agent, cleaned all folders as mentioned and installed it again. 

The same issue is there when I try to start services, 


For example this is the log of ambari-agent when I tried to start the SNameNode, which is hosted on the problematic node. 


INFO 2020-01-06 07:35:39,472 - Event from server at /user/commands: {u'clusters': {u'2': {u'commands': [{u'commandParams': '...', u'clusterId': u'2', u'clusterName': u'dev', u'commandType': u'EXECUTION_COMMAND', u'roleCommand': u'START', u'serviceName': u'HDFS', u'role': u'SECONDARY_NAMENODE', u'requestId': 424, u'taskId': 7353, u'repositoryFile': '...', u'componentVersionMap': {u'HDFS': {u'SECONDARY_NAMENODE': u'', u'JOURNALNODE': u'', u'HDFS_CLIENT': u'', u'DATANODE': u'', u'NAMENODE': u'', u'NFS_GATEWAY': u'', u'ZKFC': u''}, u'ZOOKEEPER': {u'ZOOKEEPER_SERVER': u'', u'ZOOKEEPER_CLIENT': u''}, u'SPARK2': {u'SPARK2_THRIFTSERVER': u'', u'SPARK2_CLIENT': u'', u'LIVY2_SERVER': u'', u'SPARK2_JOBHISTORYSERVER': u''}, u'SQOOP': {u'SQOOP': u''}, u'HIVE': {u'HIVE_SERVER': u'', u'HIVE_METASTORE': u'', u'HIVE_SERVER_INTERACTIVE': u'', u'HIVE_CLIENT': u''}, u'YARN': {u'YARN_REGISTRY_DNS': u'', u'RESOURCEMANAGER': u'', u'YARN_CLIENT': u'', u'TIMELINE_READER': u'', u'APP_TIMELINE_SERVER': u'', u'NODEMANAGER': u''}, u'PIG': {u'PIG': u''}, u'RANGER': {u'RANGER_TAGSYNC': u'', u'RANGER_ADMIN': u'', u'RANGER_USERSYNC': u''}, u'TEZ': {u'TEZ_CLIENT': u''}, u'MAPREDUCE2': {u'MAPREDUCE2_CLIENT': u'', u'HISTORYSERVER': u''}, u'ZEPPELIN': {u'ZEPPELIN_MASTER': u''}, u'HBASE': {u'HBASE_MASTER': u'', u'PHOENIX_QUERY_SERVER': u'', u'HBASE_CLIENT': u'', u'HBASE_REGIONSERVER': u''}, u'KAFKA': {u'KAFKA_BROKER': u''}, u'KNOX': {u'KNOX_GATEWAY': u''}, u'RANGER_KMS': {u'RANGER_KMS_SERVER': u''}}, u'commandId': u'424-0'}]}}, u'requiredConfigTimestamp': 1578284883431}
INFO 2020-01-06 07:35:39,473 - Adding EXECUTION_COMMAND for role SECONDARY_NAMENODE for service HDFS of cluster_id 2 to the queue
INFO 2020-01-06 07:35:39,473 - Event to server at /reports/responses (correlation_id=66): {'status': 'OK', 'messageId': '2'}
INFO 2020-01-06 07:35:39,475 - Event from server at /user/ (correlation_id=66): {u'status': u'OK'}
INFO 2020-01-06 07:35:40,595 - Event to server at /heartbeat (correlation_id=67): {'id': 44}
INFO 2020-01-06 07:35:40,597 - Event from server at /user/ (correlation_id=67): {u'status': u'OK', u'id': 45}
INFO 2020-01-06 07:35:42,317 - Skipping status command for INFRA_SOLR. Since command for it is running


Also, there are no logs under /var/log/hadoop/hdfs which makes me think that ambari-agent on the problematic node didn't actually initiate the call. 


I'm going to mark your answer as acceptable since it has solved the issue I originally talked about, do you think I should create a new post for this?