Support Questions

Find answers, ask questions, and share your expertise

Ambari agent registration of HDF cluster fails inspite of exitcode 0. Setup of RHEL on MS Azure.

avatar
Explorer

==========================

Creating target directory...
==========================

Command start time 2018-05-16 06:08:52
chmod: cannot access ‘/var/lib/ambari-agent/data’: No such file or directory

Warning: Permanently added 'mtvm6.eastus.cloudapp.azure.com,40.117.251.23' (ECDSA) to the list of known hosts.
Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:52

==========================
Copying ambari sudo script...
==========================

Command start time 2018-05-16 06:08:52

scp /var/lib/ambari-server/ambari-sudo.sh
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:53

==========================
Copying common functions script...
==========================

Command start time 2018-05-16 06:08:53

scp /usr/lib/python2.6/site-packages/ambari_commons
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:53

==========================
Copying create-python-wrap script...
==========================

Command start time 2018-05-16 06:08:53

scp /var/lib/ambari-server/create-python-wrap.sh
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:54

==========================
Copying OS type check script...
==========================

Command start time 2018-05-16 06:08:54

scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:54

==========================
Running create-python-wrap script...
==========================

Command start time 2018-05-16 06:08:54

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:55

==========================
Running OS type check...
==========================

Command start time 2018-05-16 06:08:55
Cluster primary/cluster OS family is redhat7 and local/current OS family is redhat7

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:55

==========================
Checking 'sudo' package on remote host...
==========================

Command start time 2018-05-16 06:08:55
sudo-1.8.19p2-11.el7_4.x86_64

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:56

==========================
Copying repo file to 'tmp' folder...
==========================

Command start time 2018-05-16 06:08:56

scp /etc/yum.repos.d/ambari.repo
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:57

==========================
Moving file to repo dir...
==========================

Command start time 2018-05-16 06:08:57

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:57

==========================
Changing permissions for ambari.repo...
==========================

Command start time 2018-05-16 06:08:57

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:57

==========================
Copying setup script file...
==========================

Command start time 2018-05-16 06:08:57

scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:08:58

==========================
Running setup agent script...
==========================

Command start time 2018-05-16 06:08:58
("INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,025 DataCleaner.py:39 - Data cleanup thread started
INFO 2018-05-16 06:09:18,027 DataCleaner.py:120 - Data cleanup started
INFO 2018-05-16 06:09:18,027 DataCleaner.py:122 - Data cleanup finished
INFO 2018-05-16 06:09:18,028 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'mtvm6.eastus.cloudapp.azure.com' using socket.getfqdn().
INFO 2018-05-16 06:09:18,035 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2018-05-16 06:09:18,038 main.py:437 - Connecting to Ambari server at https://myhdf.eastus.cloudapp.azure.com:8440 (104.211.60.99)
INFO 2018-05-16 06:09:18,038 NetUtil.py:70 - Connecting to https://myhdf.eastus.cloudapp.azure.com:8440/ca
", None)
("INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,024 main.py:145 - loglevel=logging.INFO
INFO 2018-05-16 06:09:18,025 DataCleaner.py:39 - Data cleanup thread started
INFO 2018-05-16 06:09:18,027 DataCleaner.py:120 - Data cleanup started
INFO 2018-05-16 06:09:18,027 DataCleaner.py:122 - Data cleanup finished
INFO 2018-05-16 06:09:18,028 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname 'mtvm6.eastus.cloudapp.azure.com' using socket.getfqdn().
INFO 2018-05-16 06:09:18,035 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2018-05-16 06:09:18,038 main.py:437 - Connecting to Ambari server at https://myhdf.eastus.cloudapp.azure.com:8440 (104.211.60.99)
INFO 2018-05-16 06:09:18,038 NetUtil.py:70 - Connecting to https://myhdf.eastus.cloudapp.azure.com:8440/ca
", None)

Connection to mtvm6.eastus.cloudapp.azure.com closed.
SSH command execution finished
host=mtvm6.eastus.cloudapp.azure.com, exitcode=0
Command end time 2018-05-16 06:09:20

Registering with the server...
Registration with the server failed.
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Matthias Tewordt

I am happy you have succeeded. Next time you can now help someone with the setup of HDF in Azure 🙂
Yes, the database could be set on any node but as you have already Postgres installed for Ambari it's easier to have the other databases on the same host for easier management.

CAUTION:

When in production think of setting database replication in the future.

Once you have finished the setup If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

Keep me posted

View solution in original post

50 REPLIES 50

avatar
Master Mentor

@Matthias Tewordt

I am happy you have succeeded. Next time you can now help someone with the setup of HDF in Azure 🙂
Yes, the database could be set on any node but as you have already Postgres installed for Ambari it's easier to have the other databases on the same host for easier management.

CAUTION:

When in production think of setting database replication in the future.

Once you have finished the setup If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

Keep me posted

avatar
Explorer

Great thanks a lot Geoffrey !

avatar
Explorer

Meanwhile after starting amber some of the services do work, others not. In particular, I could never get to start Registry and Streaming Analytics Manager (SAM). I identified a bug related to different postgres versions in ambari and for the Registry/SAM database and solved tihis by using mariadb for Registry/SAM. However Registry and SAM still don't start. Any ideas how to go forward from here ? Thanks, Matthias

avatar
Explorer
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/REGISTRY/0.3.0/package/scripts/registry_server.py", line 120, in <module>
    RegistryServer().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/REGISTRY/0.3.0/package/scripts/registry_server.py", line 66, in start
    user="root")
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /usr/hdf/current/registry/conf/registry-env.sh ; /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh' returned 1. Using Configuration file: /usr/hdf/current/registry/bootstrap/../conf/registry.yaml
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
	at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989)
	at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:341)
	at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2251)
	at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2284)
	at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2083)
	at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:806)
	at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
	at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:410)
	at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:328)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at com.hortonworks.registries.storage.tool.SQLScriptRunner.connect(SQLScriptRunner.java:75)
	at com.hortonworks.registries.storage.tool.SQLScriptRunner.runScript(SQLScriptRunner.java:90)
	at com.hortonworks.registries.storage.tool.TablesInitializer.doExecute(TablesInitializer.java:198)
	at com.hortonworks.registries.storage.tool.TablesInitializer.doExecuteCreate(TablesInitializer.java:175)
	at com.hortonworks.registries.storage.tool.TablesInitializer.main(TablesInitializer.java:162)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:211)
	at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
	... 19 more

stdout: /var/lib/ambari-agent/data/output-402.txt

2018-05-23 05:25:22,246 - Group['hadoop'] {}
2018-05-23 05:25:22,247 - Group['nifi'] {}
2018-05-23 05:25:22,248 - User['streamline'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,249 - User['logsearch'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,249 - User['registry'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,250 - User['storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,251 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,251 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,252 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,252 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2018-05-23 05:25:22,253 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2018-05-23 05:25:22,254 - User['nifi'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'nifi']}
2018-05-23 05:25:22,254 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2018-05-23 05:25:22,256 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2018-05-23 05:25:22,262 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2018-05-23 05:25:22,277 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2018-05-23 05:25:22,289 - Skipping Execute[('setenforce', '0')] due to only_if
2018-05-23 05:25:22,513 - Stack Feature Version Info: stack_version=3.0, version=3.0.2.0-76, current_cluster_version=3.0.2.0-76 -> 3.0.2.0-76
2018-05-23 05:25:22,515 - Directory['/var/log/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2018-05-23 05:25:22,517 - Directory['/var/run/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2018-05-23 05:25:22,517 - Creating directory Directory['/var/run/registry'] since it doesn't exist.
2018-05-23 05:25:22,517 - Changing owner for /var/run/registry from 0 to registry
2018-05-23 05:25:22,517 - Changing group for /var/run/registry from 0 to hadoop
2018-05-23 05:25:22,518 - Directory['/usr/hdf/current/registry/conf'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'recursive_ownership': True, 'owner': 'registry', 'mode': 0755}
2018-05-23 05:25:22,519 - Directory['/var/lib/ambari-agent/data/registry'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'mode': 0755, 'owner': 'registry', 'recursive_ownership': True}
2018-05-23 05:25:22,519 - Changing owner for /var/lib/ambari-agent/data/registry from 0 to registry
2018-05-23 05:25:22,524 - File['/usr/hdf/current/registry/conf/registry-env.sh'] {'content': InlineTemplate(...), 'owner': 'registry'}
2018-05-23 05:25:22,525 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2018-05-23 05:25:22,525 - Directory['/hdf/registry'] {'owner': 'registry', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2018-05-23 05:25:22,532 - File['/etc/security/limits.d/registry.conf'] {'content': Template('registry.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2018-05-23 05:25:22,541 - File['/usr/hdf/current/registry/conf/registry.yaml'] {'owner': 'registry', 'content': Template('registry.yaml.j2'), 'group': 'hadoop', 'mode': 0644}
2018-05-23 05:25:22,547 - File['/usr/lib/ambari-agent/DBConnectionVerification.jar'] {'content': DownloadSource('http://myhdf.eastus.cloudapp.azure.com:8080/resources/DBConnectionVerification.jar')}
2018-05-23 05:25:22,548 - Not downloading the file from http://myhdf.eastus.cloudapp.azure.com:8080/resources/DBConnectionVerification.jar, because /var/lib/ambari-agent/tmp/DBConnectionVerification.jar already exists
2018-05-23 05:25:22,553 - Execute['source /usr/hdf/current/registry/conf/registry-env.sh ; /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh'] {'user': 'root'}
2018-05-23 05:25:23,881 - Execute['find /var/log/registry -maxdepth 1 -type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True, 'ignore_failures': True, 'user': 'registry'}
 

Command failed after 1 tries

avatar
Expert Contributor

avatar
Master Mentor

@Matthias Tewordt

Can you backup the below file

cp /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh.bak 

Then edit /usr/hdf/current/registry/bootstrap/bootstrap-storage.sh

Update the following lines with the proxy information by adding values for

-Dhttps.proxyHost=proxy_name 
-Dhttps.proxyPort=xxxx 

Example:

function dropTables {
    ${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT>  -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --drop
}
function createTables {
    ${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT>  -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --create
}
function checkStorageConnection {
    ${JAVA} -Dbootstrap.dir=$BOOTSTRAP_DIR -Dhttps.proxyHost=<YOUR_PROXY_HOST> -Dhttps.proxyPort=<YOUR_PROXY_PORT>  -cp ${CLASSPATH} ${TABLE_INITIALIZER_MAIN_CLASS} -m ${MYSQL_JAR_URL_PATH} -c ${CONFIG_FILE_PATH} -s ${SCRIPT_ROOT_DIR} --check-connection
} 

The try restarting the Registry and SAM

avatar
Explorer

Hi Geoffrey, thanks for being back. I have not set up a proxy in my Azure environment. So I'm not sure which entries to use here..

avatar
Explorer

I have now updated the bootstrap-storage.sh on the node mtvm5 where it is hosted and chosen

-Dhttps.proxyHost=myhdf.eastus.cloudapp.azure.com (this is the node where ambari is running)

-Dhttps.proxyPort=8080 (for myhdf I had opened the port 8080 so I hope that was right choice)

As I don't use proxies I hope this is the right interpretation ?

However the Registry sill doesn't start ....

avatar
Explorer

on Ambari UI in the Registry config tab, I have changed the Registry storage URL to

jdbc:mysql://myhdf.eastus.cloudapp.azure.com:3306/registry

is this correct, if the Ambari node with mysql has the address http://myhdf.eastus.cloudapp.azure.com ??

what about the port ?

avatar
Explorer

Registry and SAM now successfully started. However Registry is still very unstable and has to be restarted again.