Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ambari-agent connection refused after host reboot

avatar

I have a 1-node ambari managed cluster which was working correctly, auto starting server, host and components when restarting the system.


I've changed to a public ip and a diferent hostname and after solving FQDN problems, I have problem with auto start when rebooting. Ambari server and agent are auto starting but heartbeat is lost and ambari-agent logs show connection refused, but if I manually restart ambari-agent, connection is correct and I can start services.


There's ambari-server UI just after rebooting.

108906-1558694191895.png


ambari-agent log shows the nex tail.

ERROR 2019-05-24 12:28:47,235 script_alert.py:119 - [Alert][hive_metastore_process] Failed with result CRITICAL: ['Metastore on bigdata.es failed (Traceback (most recent call last):\n  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 200, in execute\n    timeout_kill_strategy=TerminateStrategy.KILL_PROCESS_TREE,\n  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__\n    self.env.run()\n  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run\n    self.run_action(resource, action)\n  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action\n    provider_action()\n  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run\n    tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner\n    result = function(command, **kwargs)\n  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call\n    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)\n  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper\n    result = _call(command, **kwargs_copy)\n  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call\n    raise ExecutionFailed(err_msg, code, out, err)\nExecutionFailed: Execution of \'export HIVE_CONF_DIR=\'/usr/hdp/current/hive-metastore/conf/conf.server\' ; hive --hiveconf hive.metastore.uris=thrift://bigdata.es:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e \'show databases;\'\' returned 1. Logging initialized using configuration in file:/etc/hive/2.6.3.0-71/0/conf.server/hive-log4j.properties\nException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:547)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1566)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3510)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3542)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:528)\n\t... 8 more\nCaused by: java.lang.reflect.InvocationTargetException\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\t... 14 more\nCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Conexi\xc3\xb3n rehusada (Connection refused)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:226)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:487)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:76)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1564)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:92)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:138)\n\tat org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:110)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3510)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3542)\n\tat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:528)\n\tat org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)\n\tat org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.java:233)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:148)\nCaused by: java.net.ConnectException: Conexi\xc3\xb3n rehusada (Connection refused)\n\tat java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)\n\tat java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)\n\tat java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)\n\tat java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.net.Socket.connect(Socket.java:589)\n\tat org.apache.thrift.transport.TSocket.open(TSocket.java:221)\n\t... 22 more\n)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:534)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:282)\n\tat org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:76)\n\t... 19 more\n)']
INFO 2019-05-24 12:28:58,095 logger.py:71 - call[['test', '-w', '/dev']] {'sudo': True, 'quiet': False, 'timeout': 5}
INFO 2019-05-24 12:28:58,108 logger.py:71 - call returned (0, '')
INFO 2019-05-24 12:28:58,119 logger.py:71 - call[['test', '-w', '/']] {'sudo': True, 'quiet': False, 'timeout': 5}
INFO 2019-05-24 12:28:58,131 logger.py:71 - call returned (0, '')
INFO 2019-05-24 12:28:58,143 logger.py:71 - call[['test', '-w', '/Datos']] {'sudo': True, 'quiet': False, 'timeout': 5}
INFO 2019-05-24 12:28:58,154 logger.py:71 - call returned (0, '')
ERROR 2019-05-24 12:29:39,995 script_alert.py:119 - [Alert][hive_webhcat_server_status] Failed with result CRITICAL: ['Connection failed to http://bigdata.es:50111/templeton/v1/status?user.name=ambari-qa + \nTraceback (most recent call last):\n  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py", line 190, in execute\n    url_response = urllib2.urlopen(query_url, timeout=connection_timeout)\n  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen\n    return opener.open(url, data, timeout)\n  File "/usr/lib/python2.7/urllib2.py", line 429, in open\n    response = self._open(req, data)\n  File "/usr/lib/python2.7/urllib2.py", line 447, in _open\n    \'_open\', req)\n  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain\n    result = func(*args)\n  File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open\n    return self.do_open(httplib.HTTPConnection, req)\n  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open\n    raise URLError(err)\nURLError: <urlopen error [Errno 111] Conexi\xc3\xb3n rehusada>\n']
ERROR 2019-05-24 12:29:40,000 script_alert.py:119 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://bigdata.es:8042/ws/v1/node/info (Traceback (most recent call last):\n  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute\n    url_response = urllib2.urlopen(query, timeout=connection_timeout)\n  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen\n    return opener.open(url, data, timeout)\n  File "/usr/lib/python2.7/urllib2.py", line 429, in open\n    response = self._open(req, data)\n  File "/usr/lib/python2.7/urllib2.py", line 447, in _open\n    \'_open\', req)\n  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain\n    result = func(*args)\n  File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open\n    return self.do_open(httplib.HTTPConnection, req)\n  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open\n    raise URLError(err)\nURLError: <urlopen error [Errno 111] Conexi\xc3\xb3n rehusada>\n)']
INFO 2019-05-24 13:30:00,155 main.py:96 - loglevel=logging.INFO
INFO 2019-05-24 13:30:00,157 main.py:96 - loglevel=logging.INFO
INFO 2019-05-24 13:30:00,157 main.py:96 - loglevel=logging.INFO
INFO 2019-05-24 13:30:00,159 DataCleaner.py:39 - Data cleanup thread started
INFO 2019-05-24 13:30:00,164 DataCleaner.py:120 - Data cleanup started
INFO 2019-05-24 13:30:00,295 DataCleaner.py:122 - Data cleanup finished
INFO 2019-05-24 13:30:00,314 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2019-05-24 13:30:00,314 main.py:132 - Newloglevel=logging.DEBUG
INFO 2019-05-24 13:30:00,314 main.py:405 - Connecting to Ambari server at https://bigdata.es:8440 (10.61.2.10)
DEBUG 2019-05-24 13:30:00,314 NetUtil.py:110 - Trying to connect to https://bigdata.es:8440
INFO 2019-05-24 13:30:00,315 NetUtil.py:67 - Connecting to https://bigdata.es:8440/ca
WARNING 2019-05-24 13:30:00,317 NetUtil.py:98 - Failed to connect to https://bigdata.es:8440/ca due to [Errno 111] Conexión rehusada
WARNING 2019-05-24 13:30:00,317 NetUtil.py:121 - Server at https://bigdata.es:8440 is not reachable, sleeping for 10 seconds...


And ambari-agent.ini config is the following

[server]
hostname = bigdata.es
url_port = 8440
secured_url_port = 8441
connect_retry_delay = 10
max_reconnect_retry_delay = 30

[agent]
logdir = /var/log/ambari-agent
piddir = /var/run/ambari-agent
prefix = /var/lib/ambari-agent/data
loglevel = DEBUG
data_cleanup_interval = 86400
data_cleanup_max_age = 2592000
data_cleanup_max_size_mb = 100
ping_port = 8670
cache_dir = /var/lib/ambari-agent/cache
tolerate_download_failures = true
run_as_user = root
parallel_execution = 0
alert_grace_period = 5
status_command_timeout = 5
alert_kinit_timeout = 14400000
system_resource_overrides = /etc/resource_overrides

[security]
keysdir = /var/lib/ambari-agent/keys
server_crt = ca.crt
passphrase_env_var_name = AMBARI_PASSPHRASE
ssl_verify_cert = 0
credential_lib_dir = /var/lib/ambari-agent/cred/lib
credential_conf_dir = /var/lib/ambari-agent/cred/conf
credential_shell_cmd = org.apache.hadoop.security.alias.CredentialShell
force_https_protocol = PROTOCOL_TLSv1_2

[services]
pidlookuppath = /var/run/

[heartbeat]
state_interval_seconds = 60
dirs = /etc/hadoop,/etc/hadoop/conf,/etc/hbase,/etc/hcatalog,/etc/hive,/etc/oozie,
        /etc/sqoop,
        /var/run/hadoop,/var/run/zookeeper,/var/run/hbase,/var/run/templeton,/var/run/oozie,
        /var/log/hadoop,/var/log/zookeeper,/var/log/hbase,/var/run/templeton,/var/log/hive
log_lines_count = 300
idle_interval_min = 1
idle_interval_max = 10

[logging]
syslog_enabled = 0


If I run sudo ambari-agent restart command, then I can connect and start services in ambari-server ui.

108943-1558694501380.png

15 REPLIES 15

avatar
Master Mentor

@Adrián Gil

Changing the hostname /etc/hosts is not enough there are the below step you MUST perform to have your Cluster working again. In Ambari Web > Dashboard, stop all services.

Stop ambari-server and ambari-agents on all hosts.

# ambari-server stop
# ambari-agent stop

Create a file host_names_changes.json file with hostnames changes.

Contents of the host_names_changes.json

{
  "bigdata" : {
      "old_name_here" : "bigdata-es"
  }
}

From the directory where you saved the above file as a root user run

# ambari-server update-host-names host_names_changes.json


After successful execution, your ambari will update all the necessary files except the old hostnames in ambari.properties change the below values


Change these 3 values in /etc/ambari-server/conf/ambari.properties

server.jdbc.rca.url=
server.jdbc.url=jdbc=
server.jdbc.hostname=

You will need to change also values for hive,oozie etc change according to your installation

# Hive

grant all privileges on hive.* to 'hive'@'bigdata-es' identified by 'hive_password';
grant all privileges on hive.* to 'hive'@'bigdata-es' with grant option;

# Oozie

grant all privileges on oozie.* to 'hive'@'bigdata-es' identified by 'oozie_password';
grant all privileges on oozie.* to 'hive'@'bigdata-es' with grant option;

# Ranger

grant all privileges on ranger.* to 'hive'@'bigdata-es' identified by 'ranger_password';
grant all privileges on ranger.* to 'hive'@'bigdata-es' with grant option;

# Rangerkms

grant all privileges on rangerkms.* to 'hive'@'bigdata-es' identified by 'rangerkms_password';
grant all privileges on rangerkms.* to 'hive'@'bigdata-es' with grant option;


After doing the above now you can restart amabri and the ambari-agent hoping you have changed to the name in ambari-agent.ini,

Start ambari-server and ambari-agents on all hosts.

# ambari-server start
# ambari-agent start

Before running the Ambari component startall make sure hive,oozie,ranger etc have the correct hostname in the Ambari UI Configs !!

Happy Hadooping


avatar

Thanks for your explained answer @Geoffrey Shelton Okot


I've tried to do so and changed all manually changeable values you mentioned (including setting same hostname as it had before ip change)


If I try this last option, I can make it correctly run with 'ambari-agent restart' but it won't in system 'sudo reboot'.

When trying a different hostanem and making changes you told me, I got an error when

  1. ambari-server update-host-names host_names_changes.json

And ambari-server.log shows folloging trace:

  1. 27 may 2019 09:59:38,771 ERROR [main] AmbariJpaLocalTxnInterceptor:180 - [DETAILED ERROR] Rollback reason: Local Exception Stack: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists. Error Code: 0 Call: UPDATE hosts SET host_name = ? WHERE (host_id = ?)         bind => [2 parameters bound]         at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:340)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1620)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:964)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccess

avatar

The above question and the reply thread below were originally posted in the Community Help Track. On Fri May 24 09:14 PDT 2019, a member of the HCC moderation staff moved it to the Cloud & Operations Track. The Community Help Track is intended for questions about using the HCC site itself.

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar

Thanks for your explained answer @Geoffrey Shelton Okot


I've tried to do so and changed all manually changeable values you mentioned (including setting same hostname as it had before ip change)


If I try this last option, I can make it correctly run with 'ambari-agent restart' but it won't in system 'sudo reboot'.

When trying a different hostanem and making changes you told me, I got an error when

 ambari-server update-host-names host_names_changes.json 

And ambari-server.log shows folloging trace:

27 may 2019 09:59:38,771 ERROR [main] AmbariJpaLocalTxnInterceptor:180 - [DETAILED ERROR] Rollback reason: Local Exception Stack: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists. Error Code: 0 Call: UPDATE hosts SET host_name = ? WHERE (host_id = ?)         bind => [2 parameters bound]         at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:340)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1620)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:964)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:633)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1845)         at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4300)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5592)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1646)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1614)         at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:285)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)         at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:134)         at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:153)         at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)         at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)         at org.apache.ambari.server.orm.dao.HostDAO$$EnhancerByGuice$$3e1957eb.merge(<generated>)         at org.apache.ambari.server.update.HostUpdateHelper.updateHostsInDB(HostUpdateHelper.java:407)         at org.apache.ambari.server.update.HostUpdateHelper.main(HostUpdateHelper.java:548) Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists.         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)         at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:559)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:892)         ... 18 more 27 may 2019 09:59:38,781 ERROR [main] AmbariJpaLocalTxnInterceptor:188 - [DETAILED ERROR] Internal exception (1) : org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists.         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)         at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:559)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:892)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:964)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:633)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1845)         at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4300)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5592)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1646)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1614)         at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:285)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)         at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:134)         at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:153)         at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)         at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)         at org.apache.ambari.server.orm.dao.HostDAO$$EnhancerByGuice$$3e1957eb.merge(<generated>)         at org.apache.ambari.server.update.HostUpdateHelper.updateHostsInDB(HostUpdateHelper.java:407)         at org.apache.ambari.server.update.HostUpdateHelper.main(HostUpdateHelper.java:548) 27 may 2019 09:59:38,781 ERROR [main] HostUpdateHelper:564 - Unexpected error, host names update failed javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists. Error Code: 0 Call: UPDATE hosts SET host_name = ? WHERE (host_id = ?)         bind => [2 parameters bound]         at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:159)         at org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:153)         at org.apache.ambari.server.update.HostUpdateHelper.updateHostsInDB(HostUpdateHelper.java:407)         at org.apache.ambari.server.update.HostUpdateHelper.main(HostUpdateHelper.java:548) Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists. Error Code: 0 Call: UPDATE hosts SET host_name = ? WHERE (host_id = ?)         bind => [2 parameters bound]         at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:340)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1620)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:900)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeNoSelect(DatabaseAccessor.java:964)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:633)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatch(ParameterizedSQLBatchWritingMechanism.java:149)         at org.eclipse.persistence.internal.databaseaccess.ParameterizedSQLBatchWritingMechanism.executeBatchedStatements(ParameterizedSQLBatchWritingMechanism.java:134)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.writesCompleted(DatabaseAccessor.java:1845)         at org.eclipse.persistence.internal.sessions.AbstractSession.writesCompleted(AbstractSession.java:4300)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.writesCompleted(UnitOfWorkImpl.java:5592)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1646)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1614)         at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:285)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)         at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:134)         ... 3 more Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   Detail: Key (host_name)=(bigdata.es) already exists.         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)         at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:559)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)         at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)         at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeDirectNoSelect(DatabaseAccessor.java:892)

avatar
Master Mentor

@Adrián Gil

What values are in your json file?

Can you share a redacted version of the /etc/hosts

and the output of

$ hostname -f 

The below shows you are changing to a value which is no different from the old value 🙂

Detail: Key (host_name)=(bigdata.es) already exists.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)         
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)         
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) 


Please revert


avatar
Master Mentor

@Adrián Gil

What values do you have in /etc/host?

Can you share a redacted version of your /etc/hosts and the output of


$ hostname -f

The below shows that you are trying to change to a value which is no different from the older value 🙂

Detail: Key (host_name)=(bigdata.es) already exists.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2161)         
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1890)         
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) 

ERROR: duplicate key value violates unique constraint "uq_hosts_host_name"   
Detail: Key (host_name)=(bigdata.es) already exists. Error Code: 0 
Call: UPDATE hosts SET host_name = ? WHERE (host_id = ?)bind => [2 parameters bound]

Please revert


avatar

Geoffrey Shelton Okot


There you have hostname -f and /etc/hosts outputs.

bigdata@bigdata:~$ hostname -f
bigdatapruebas.es
bigdata@bigdata:~$ sudo cat /etc/hosts
10.61.2.10              bigdatapruebas.es bigdata.adurizenergia.es bigdata.es bigdata

::1                     localhost ip6-localhost ip6-loopback
ff02::1                 ip6-allnodes
ff02::02                ip6-allrouters
bigdata@bigdata:~$

Also script is the following one.

{
  "bigdata" : {
      "bigdata.es" : "bigdatapruebas.es"
  }
}


It did finnaly succeded, but as it's shown on the following picture, heartbeat is still lost after sudo reboot the system. I changed values from different try to make sure names are completly different.

108915-1558972173890.png

After sudo ambari-agent restart command, it's recognized again and we can see stopped status and not heartbeat lost one.

108924-1558972006404.png

avatar
Master Mentor

@Adrián Gil

You succeeded but encountering heartbeat lost because your /etc/hosts entry is wrong 🙂 I really can't understand how you can connect 🙂 The below entry is wrong and shouldn't resolve that somehow explains why you had difficulty in running the host_names_changes.json

bigdata@bigdata:~$ sudo cat /etc/hosts
10.61.2.10              bigdatapruebas.es bigdata.adurizenergia.es bigdata.es bigdata

Should be the /etc/host entry should be exactly the output of

$ hostname -f
bigdatapruebas.es

So your /etc/hosts should look like below

$ sudo cat /etc/hosts
10.61.2.10       bigdatapruebas.es 

But if you had an FQDN like bigdata.endesa.es the entry could be IP FQDN ALIAS

$ sudo cat /etc/hosts
IP                FQDN                  ALIAS
------------------------------------------------
10.61.2.10       bigdata.endesa.es      bigdata

With the above entry in the /etc/hosts, you can access ambari successfully in 2 ways

http://bigdata.endesa.es:8080

and

http://bigdata:8080


Please do the necessary changes and revert





avatar

@Geoffrey Shelton Okot


Already done /etc/hosts modifications and with this result:

  1. bigdata@bigdata:~$ cat /etc/hosts
  2. IP                FQDN                  ALIAS
  3. ------------------------------------------------
  4. 10.61.2.10       bigdatapruebas.es      bigdata.es
  5. #10.61.2.10             bigdatapruebas.es #bigdata.es bigdata bigdata.pruebasenergia.es
  6.  
  7. ::1                     localhost ip6-localhost ip6-loopback
  8. ff02::1                 ip6-allnodes
  9. ff02::02                ip6-allrouters
  10.  
  11. bigdata@bigdata:~$ hotname -f No se ha encontrado la orden «hotname», quizás quiso decir:
  12.  La orden «hostname» del paquete «hostname» (main) hotname: no se encontró la orden
  13. bigdata@bigdata:~$ hostname -f
  14. bigdatapruebas.es
  15. bigdata@bigdata:~$


But happens the same thing. When applying changes it works but not working if rebooting until I manually restart agent 😞