Created 09-14-2017 09:50 AM
I had an issue installing HDF with Ambari and accidentally I closed Ambari UI. When I try to continue with the cluster, the previous configuration was lost and it’s impossible to pass the “Confirm Hosts” phase. The hosts are in “Preparing” Status always without advance.
ambari-confirm-hosts-preparing.png
Additionally, I found in /var/log/ambari-server/ambari-server.log
1 ) Alerts with the name of the first cluster (that it wasn’t finished) cordisclu. Now the cluster will have a new name.
14 sep 2017 10:59:00,526 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu 14 sep 2017 10:59:00,526 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert infra_solr for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert logsearch_ui for an invalid cluster named cordisclu 14 sep 2017 10:59:00,528 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu 14 sep 2017 10:59:00,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,528 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 10:59:00,529 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu 14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,529 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,530 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 10:59:00,530 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu 14 sep 2017 10:59:00,531 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 10:59:00,531 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 10:59:01,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_process for an invalid cluster named cordisclu 14 sep 2017 10:59:01,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_autostart for an invalid cluster named cordisclu 14 sep 2017 10:59:01,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu 14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 10:59:01,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 10:59:01,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_hbase_master_process for an invalid cluster named cordisclu 14 sep 2017 10:59:10,526 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert grafana_webui for an invalid cluster named cordisclu
I found a reference that does not work in my case (https://community.hortonworks.com/questions/41553/unable-to-process-alert-for-an-invalid-cluster-nam...)
I think that deleting cache and data information in the ambari-agent folder (/var/lib/ambari-agent/) and restarting ambari-agent could be the solution, but this does not resolve my main issue.
2) Error when executing bootstrap
14 sep 2017 10:54:16,453 INFO [Thread-1112] BSRunner:189 - Kicking off the scheduler for polling on logs in /var/run/ambari-server/bootstrap/7 14 sep 2017 10:54:16,454 INFO [Thread-1112] BSRunner:372 - Error executing bootstrap Cannot create /var/run/ambari-server/bootstrap 14 sep 2017 10:54:16,455 ERROR [Thread-1112] BSRunner:441 - java.io.FileNotFoundException: /var/run/ambari-server/bootstrap/7/srvifsidsp01.xxxxxxxxxxx.done (No existe el fichero o el directorio) 14 sep 2017 10:54:16,456 WARN [Thread-1112] BSRunner:401 - File does not exist: /var/run/ambari-server/bootstrap/7/sshKey
Note: "No existe el fichero o el directorio" means "File or directory does not exist" in Spanish.
More information:
OS: CentOS 7
Ambari version: 2.5.1.0
Created 09-14-2017 11:50 AM
It looks strange. I will suggest please try once again, that will be quick.
# ambari-server stop # ambari-server reset # ambari-server start
.
Created 09-14-2017 10:13 AM
As we see the following error on your logs:
14 sep 2017 10:54:16,454 INFO [Thread-1112] BSRunner:372 - Error executing bootstrap Cannot create /var/run/ambari-server/bootstrap14 sep 2017 10:54:16,455 ERROR [Thread-1112] BSRunner:441 - java.io.FileNotFoundException: /var/run/ambari-server/bootstrap/7/srvifsidsp01.xxxxxxxxxxx.done (No existe el fichero o el directorio)
.
So can you please check what is the permission set for the following directories and if the user who is running ambari has the right to write to those directories?
# ls -ltr /var/run lrwxrwxrwx. 1 root root 6 Dec 1 2014 /var/run -> ../run # ls -l /var/run/ambari-server/bootstrap drwxr-xr-x. 2 root root 240 Jun 9 11:06 7
.
The user who is running ambari should have proper permissions to these directories as above.
.
Better try the following approach and then try again, Set the permissions on the /var/run/ambari-server directory, change the permission to 777, and then try the wizard again.
# chmod -R 777 /var/run/ambari-server
.
Created 09-14-2017 11:00 AM
Thanks Jay for your answer.
I have a little advance. Now, I have “Success” status but the wizard tells “ Please wait while the hosts are being checked for potential problems.”
ambari-confirm-hosts-success-but-waiting.png
And I found continuously in /var/log/ambari-server/ambari-server.log
14 sep 2017 12:46:49,483 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1 14 sep 2017 12:46:49,484 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1 14 sep 2017 12:46:49,484 WARN [ambari-action-scheduler] ActionScheduler:316 - Exception received java.lang.NullPointerException at org.apache.ambari.server.actionmanager.Stage.getStartTime(Stage.java:630) at org.apache.ambari.server.actionmanager.ActionScheduler.processHostRole(ActionScheduler.java:1065) at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:461) at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:310) at java.lang.Thread.run(Thread.java:745)
The alerts with the name of the first cluster (cordisclu) are still appearing.
Created 09-14-2017 11:32 AM
As we see the following message :
WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
Above seems to be causing the issue later ... with NullPointerException as [1]
So at this point i guess we have two option to proceed further.
QUICK OPTION(Simple One)
As this is a fresh cluster that we are setting up . So better run "ambari-server reset" to clean ambari DB and then recreate cluster freshly.
# ambari-server stop # ambari-server reset # ambari-server start
.
OTHER OPTION (Complicated One)
If we want to debug what is causing NPE, then we will ahve to look at few DB tables to understand that. Looks like due to few attempt of cluster creation the cluster id got some issues. Can you please share the output of the following SQL queries on Ambari DB?
# psql -U ambari ambari Password for user ambari: bigdata ambari=> SELECT repo_version_id, stack_id, version, display_name FROM repo_version; ambari=> SELECT * FROM clusters; ambari=> SELECT * FROM cluster_version; ambari=> SELECT * FROM host_version;
.
Created 09-14-2017 11:43 AM
The first option I tried yesterday and It didn't work.
So, we go with the other option.
Here you can see the results of the queries
$ psql -U ambari ambari Contraseña para usuario ambari: psql (9.2.21) Digite «help» para obtener ayuda. ambari=> SELECT repo_version_id, stack_id, version, display_name FROM repo_version; repo_version_id | stack_id | version | display_name -----------------+----------+---------+-------------- (0 filas) ambari=> SELECT * FROM clusters; cluster_id | resource_id | upgrade_id | cluster_info | cluster_name | provisioning_state | security_type | desired_cluster_state | desired_stack_id ------------+-------------+------------+--------------+--------------+--------------------+---------------+-----------------------+------------------ (0 filas) ambari=> SELECT * FROM cluster_version; id | repo_version_id | cluster_id | state | start_time | end_time | user_name ----+-----------------+------------+-------+------------+----------+----------- (0 filas) ambari=> SELECT * FROM host_version; id | repo_version_id | host_id | state ----+-----------------+---------+------- (0 filas)
We have 0 rows.
Created 09-14-2017 11:50 AM
It looks strange. I will suggest please try once again, that will be quick.
# ambari-server stop # ambari-server reset # ambari-server start
.
Created 09-15-2017 03:30 PM
It works, thank you! This solves one issue. I can advance in the deployment with the Ambari Cluster Install Wizard.
But, I still have the alerts with the old name of the cluster:
14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
Created 09-14-2017 02:04 PM
It works! Thank you!
But I still have the alerts with the name of the old cluster.
Do you know how to resolve this, please?
14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu 14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
Created 11-08-2018 03:42 AM
What worked? I don't see any solution. I have the same problem. When I ran those SELECT statements all of the results were 0 rows. I failed to see what solved the problem. Please explain!
Created 09-17-2017 06:55 AM
Good to know that the cluster is created fine now.
Regarding the other issue of having old cluster names for few alerts:
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu 14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
.
I think this might have happened, if the alert tables have some alert's targetting to the old cluster id. Please check and share the output of the following Queries on Ambari DB:
# psql -U ambari ambari Password for user ambari: bigdata psql (9.2.18) Type "help" for help. ambari=> SELECT cluster_id, definition_name FROM alert_definition WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT'); ambari=> SELECT cluster_id, service_name FROM alert_group WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT'); ambari=> SELECT cluster_id, alert_label, alert_definition_id FROM alert_history WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT'); ambari=> SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT';
.