Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari: Status "Preparing" in confirmation of hosts

avatar
Explorer

I had an issue installing HDF with Ambari and accidentally I closed Ambari UI. When I try to continue with the cluster, the previous configuration was lost and it’s impossible to pass the “Confirm Hosts” phase. The hosts are in “Preparing” Status always without advance.

ambari-confirm-hosts-preparing.png

Additionally, I found in /var/log/ambari-server/ambari-server.log

1 ) Alerts with the name of the first cluster (that it wasn’t finished) cordisclu. Now the cluster will have a new name.

14 sep 2017 10:59:00,526 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
14 sep 2017 10:59:00,526 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert infra_solr for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 10:59:00,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert logsearch_ui for an invalid cluster named cordisclu
14 sep 2017 10:59:00,528 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
14 sep 2017 10:59:00,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,528 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 10:59:00,529 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
14 sep 2017 10:59:00,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,529 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,530 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 10:59:00,530 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
14 sep 2017 10:59:00,531 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 10:59:00,531 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 10:59:01,527 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_process for an invalid cluster named cordisclu
14 sep 2017 10:59:01,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_autostart for an invalid cluster named cordisclu
14 sep 2017 10:59:01,528 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu
14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 10:59:01,529 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 10:59:01,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 10:59:01,530 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_collector_hbase_master_process for an invalid cluster named cordisclu
14 sep 2017 10:59:10,526 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert grafana_webui for an invalid cluster named cordisclu

I found a reference that does not work in my case (https://community.hortonworks.com/questions/41553/unable-to-process-alert-for-an-invalid-cluster-nam...)

I think that deleting cache and data information in the ambari-agent folder (/var/lib/ambari-agent/) and restarting ambari-agent could be the solution, but this does not resolve my main issue.

2) Error when executing bootstrap

14 sep 2017 10:54:16,453  INFO [Thread-1112] BSRunner:189 - Kicking off the scheduler for polling on logs in /var/run/ambari-server/bootstrap/7
14 sep 2017 10:54:16,454  INFO [Thread-1112] BSRunner:372 - Error executing bootstrap Cannot create /var/run/ambari-server/bootstrap
14 sep 2017 10:54:16,455 ERROR [Thread-1112] BSRunner:441 - java.io.FileNotFoundException: /var/run/ambari-server/bootstrap/7/srvifsidsp01.xxxxxxxxxxx.done (No existe el fichero o el directorio)
14 sep 2017 10:54:16,456  WARN [Thread-1112] BSRunner:401 - File does not exist: /var/run/ambari-server/bootstrap/7/sshKey

Note: "No existe el fichero o el directorio" means "File or directory does not exist" in Spanish.

More information:

OS: CentOS 7

Ambari version: 2.5.1.0

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Juan Vares

It looks strange. I will suggest please try once again, that will be quick.

    # ambari-server stop
    # ambari-server reset
    # ambari-server start 

.

View solution in original post

10 REPLIES 10

avatar
Master Mentor

@Juan Vares

As we see the following error on your logs:

14 sep 2017 10:54:16,454  INFO [Thread-1112] BSRunner:372 - Error executing bootstrap Cannot create /var/run/ambari-server/bootstrap14 sep 2017 10:54:16,455 

ERROR [Thread-1112] BSRunner:441 - java.io.FileNotFoundException: /var/run/ambari-server/bootstrap/7/srvifsidsp01.xxxxxxxxxxx.done (No existe el fichero o el directorio)

.

So can you please check what is the permission set for the following directories and if the user who is running ambari has the right to write to those directories?

#  ls -ltr /var/run 
lrwxrwxrwx. 1 root root 6 Dec  1  2014 /var/run -> ../run

# ls -l /var/run/ambari-server/bootstrap
drwxr-xr-x. 2 root root 240 Jun  9 11:06 7

.

The user who is running ambari should have proper permissions to these directories as above.

.

Better try the following approach and then try again, Set the permissions on the /var/run/ambari-server directory, change the permission to 777, and then try the wizard again.

# chmod -R 777  /var/run/ambari-server 

.

avatar
Explorer

Thanks Jay for your answer.

I have a little advance. Now, I have “Success” status but the wizard tells “ Please wait while the hosts are being checked for potential problems.

ambari-confirm-hosts-success-but-waiting.png

And I found continuously in /var/log/ambari-server/ambari-server.log

14 sep 2017 12:46:49,483  WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
14 sep 2017 12:46:49,484  WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
14 sep 2017 12:46:49,484  WARN [ambari-action-scheduler] ActionScheduler:316 - Exception received
java.lang.NullPointerException
        at org.apache.ambari.server.actionmanager.Stage.getStartTime(Stage.java:630)
        at org.apache.ambari.server.actionmanager.ActionScheduler.processHostRole(ActionScheduler.java:1065)
        at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:461)
        at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:310)
        at java.lang.Thread.run(Thread.java:745)


The alerts with the name of the first cluster (cordisclu) are still appearing.

avatar
Master Mentor

@Juan Vares

As we see the following message :

WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1

Above seems to be causing the issue later ... with NullPointerException as [1]

https://github.com/apache/ambari/blob/release-2.5.1/ambari-server/src/main/java/org/apache/ambari/se...

So at this point i guess we have two option to proceed further.

QUICK OPTION(Simple One)

As this is a fresh cluster that we are setting up . So better run "ambari-server reset" to clean ambari DB and then recreate cluster freshly.

# ambari-server stop
# ambari-server reset
# ambari-server start 

.

OTHER OPTION (Complicated One)

If we want to debug what is causing NPE, then we will ahve to look at few DB tables to understand that. Looks like due to few attempt of cluster creation the cluster id got some issues. Can you please share the output of the following SQL queries on Ambari DB?

# psql -U ambari ambari
Password for user ambari: bigdata

ambari=> SELECT repo_version_id, stack_id, version, display_name FROM repo_version;
ambari=> SELECT * FROM clusters;
ambari=> SELECT * FROM cluster_version;
ambari=> SELECT * FROM host_version;

.

avatar
Explorer

The first option I tried yesterday and It didn't work.

So, we go with the other option.

Here you can see the results of the queries

$ psql -U ambari ambari
Contraseña para usuario ambari:
psql (9.2.21)
Digite «help» para obtener ayuda.

ambari=> SELECT repo_version_id, stack_id, version, display_name FROM repo_version;
 repo_version_id | stack_id | version | display_name
-----------------+----------+---------+--------------
(0 filas)


ambari=> SELECT * FROM clusters;
 cluster_id | resource_id | upgrade_id | cluster_info | cluster_name | provisioning_state | security_type | desired_cluster_state | desired_stack_id
------------+-------------+------------+--------------+--------------+--------------------+---------------+-----------------------+------------------
(0 filas)


ambari=> SELECT * FROM cluster_version;
 id | repo_version_id | cluster_id | state | start_time | end_time | user_name
----+-----------------+------------+-------+------------+----------+-----------
(0 filas)


ambari=> SELECT * FROM host_version;
 id | repo_version_id | host_id | state
----+-----------------+---------+-------
(0 filas)


We have 0 rows.

avatar
Master Mentor

@Juan Vares

It looks strange. I will suggest please try once again, that will be quick.

    # ambari-server stop
    # ambari-server reset
    # ambari-server start 

.

avatar
Explorer

It works, thank you! This solves one issue. I can advance in the deployment with the Ambari Cluster Install Wizard.

But, I still have the alerts with the old name of the cluster:

14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu

avatar
Explorer

@Jay SenSharma

It works! Thank you!

But I still have the alerts with the name of the old cluster.

Do you know how to resolve this, please?

14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert kafka_broker_process for an invalid cluster named cordisclu
14 sep 2017 15:29:20,717 ERROR [alert-event-bus-2] AlertReceivedListener:480 - Unable to process alert zookeeper_server_process for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu

avatar
New Contributor

What worked? I don't see any solution. I have the same problem. When I ran those SELECT statements all of the results were 0 rows. I failed to see what solved the problem. Please explain!

avatar
Master Mentor

@Juan Vares

Good to know that the cluster is created fine now.

Regarding the other issue of having old cluster names for few alerts:

14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert nifi_status for an invalid cluster named cordisclu
14 sep 2017 15:29:38,717 ERROR [alert-event-bus-1] AlertReceivedListener:480 - Unable to process alert ambari_agent_disk_usage for an invalid cluster named cordisclu

.

I think this might have happened, if the alert tables have some alert's targetting to the old cluster id. Please check and share the output of the following Queries on Ambari DB:

# psql -U ambari ambari
Password for user ambari:  bigdata
psql (9.2.18)
Type "help" for help.

ambari=> SELECT cluster_id, definition_name FROM alert_definition WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT');

ambari=> SELECT cluster_id, service_name FROM alert_group WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT');

ambari=> SELECT cluster_id, alert_label, alert_definition_id FROM alert_history WHERE cluster_id NOT IN (SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT');

ambari=> SELECT cluster_id FROM cluster_version WHERE state = 'CURRENT';

.