Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Ambari error when restarting flume agent

Explorer

I have an error when restarting flume agent on one of two HDP nodes:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-RESTART/scripts/hook.py", line 20, in <module>
    from resource_management import *
  File "/usr/lib/python2.6/site-packages/resource_management/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/__init__.py", line 25, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/default.py", line 24, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 31, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/__init__.py", line 21, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 133, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 115, in __init__
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 112, in initialize_data 

Exception: Couldn't load 'os_family.json' file

First HDP node restats OK. Both nodes are freshly installed and simillar. What's wrong?

1 ACCEPTED SOLUTION

@Kirill Elsukov

I hate to say this ..You have to uninstall ambari-agent and reinstall...I have see this behavior few times and after lot of troubleshooting...I fixed it by following the above approach

ambari-agent stop

yum remove ambari-agent

yum install ambari-agent

ambar-agent start

View solution in original post

9 REPLIES 9

@Kirill Elsukov

I hate to say this ..You have to uninstall ambari-agent and reinstall...I have see this behavior few times and after lot of troubleshooting...I fixed it by following the above approach

ambari-agent stop

yum remove ambari-agent

yum install ambari-agent

ambar-agent start

Explorer

Yep, that did help. So in my case the final result is:

  • remove all ambari* from /usr/lib/python2.6/site-packages
  • yum remove ambari-agent
  • yum install ambari-agent
  • ps -ef | grep ambari
  • kill -9 <process_id> with <process_id> of ambari_agent processes
  • ambari-agent start

Thank you very much!

Guru

I had this same issue and these 4 steps resolved it each time. (Seemed for me that flume agent would not stop only when restarting flume service after config change. If I stopped the flume agent, then made config change, then restart flume service, I did not get this issue).

Guru

NOTE: I can replicate this issue on the sandbox but not another cluster I installed. Could anyone verify if you are also getting this issue, and if it is on sandbox or not.

Explorer

I reinstalled ambari-client but it didn't help. It didn't even allow to stop Flume service:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume_handler.py", line 20, in <module>
    import flume_upgrade
  File "/var/lib/ambari-agent/cache/common-services/FLUME/1.4.0.2.0/package/scripts/flume_upgrade.py", line 24, in <module>
    from resource_management.core.logger import Logger
  File "/usr/lib/python2.6/site-packages/resource_management/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/__init__.py", line 25, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/default.py", line 24, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/__init__.py", line 23, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 31, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/__init__.py", line 21, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 133, in <module>
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 115, in __init__
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_check.py", line 112, in initialize_data 

Exception: Couldn't load 'os_family.json' file

@Kirill Elsukov

Try this

cd /usr/lib/python2.6/site-packages/

ln -s /usr/lib/ambari-agent/lib/ambari_commons ambari_commons

ln -s /usr/lib/ambari-agent/lib/resource_management resource_management

ln -s /usr/lib/ambari-agent/lib/ambari_jinja2 ambari_jinja2

ambari-agent restart

Explorer

That didn't help, I still get the errors.

I've found a solution from another simillar issue:

  • remove all ambari* from /usr/lib/python2.6/site-packages, then
  • yum remove ambari-agent
  • yum install ambari-agent

That did help, but now I can't start ambari-agent because of the folllowing error:

"Failed to start ping port listener of: [Errno 98] Address already in use"

@Kirill Elsukov

That's what I shared in my initial reply

"I hate to say this ..You have to uninstall ambari-agent and reinstall...I have see this behavior few times and after lot of troubleshooting...I fixed it by following the above approach

ambari-agent stop

yum remove ambari-agent

yum install ambari-agent

ambar-agent start"

ambari-agent stop

ambari-agent start

If it does not work then do this

root@phdns02 ~]# netstat -anp | grep 8670

tcp 0 0 0.0.0.0:8670 0.0.0.0:* LISTEN 446256/python2

[root@phdns02 ~]# ps -ef | grep 446256

root 175471 170130 0 06:28 pts/1 00:00:00 grep 446256

root 446256 446248 2 Feb14 ? 03:10:42 /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=phdns02.cloud.hortonworks.com

kill -9 pid

Explorer

Looks like after reinstalling ambari-agent metrics crashed. I recieve an error when trying to start it:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 58, in <module>
    AmsMonitor().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 37, in start
    self.configure(env) # for security
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 32, in configure
    import params
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/params.py", line 26, in <module>
    import status_params
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/status_params.py", line 37, in <module>
    kinit_path_local = functions.get_kinit_path(default('/configurations/kerberos-env/executable_search_paths', None))
NameError: name 'functions' is not defined
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.