Support Questions

Find answers, ask questions, and share your expertise

ambari cluster not working, error in history server

avatar
Rising Star

im based on http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.listr

selected spark2 and all its required dependencies

the following services have an error:

  • History Server - Connection failed: [Errno 111] Connection refused to ambari-agent1
  • Hive Metastore
  • HiveServer2

i receive the following error on manual starting history server

INFO 2017-10-10 04:57:10,565 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-10-10 04:57:10,565 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-10-10 04:57:10,681 Hardware.py:176 - Some mount points were ignored: /dev, /run, /, /dev/shm, /run/lock, /sys/fs/cgroup, /boot, /home, /run/user/108, /run/user/1007, /run/user/1005, /run/user/1010, /run/user/1011, /run/user/1012, /run/user/1001
INFO 2017-10-10 04:57:10,682 Controller.py:320 - Sending Heartbeat (id = 4066)
INFO 2017-10-10 04:57:10,688 Controller.py:333 - Heartbeat response received (id = 4067)
INFO 2017-10-10 04:57:10,688 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2017-10-10 04:57:10,688 Controller.py:380 - Updating configurations from heartbeat
INFO 2017-10-10 04:57:10,688 Controller.py:389 - Adding cancel/execution commands
INFO 2017-10-10 04:57:10,688 Controller.py:475 - Waiting 0.9 for next heartbeat
INFO 2017-10-10 04:57:11,589 Controller.py:482 - Wait for next heartbeat over
WARNING 2017-10-10 04:57:22,205 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. division by zero
INFO 2017-10-10 04:57:27,060 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster
INFO 2017-10-10 04:57:27,071 Controller.py:249 - Adding 1 commands. Heartbeat id = 4085
INFO 2017-10-10 04:57:27,071 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role SPARK2_JOBHISTORYSERVER for service SPARK2 of cluster vqcluster to the queue.
INFO 2017-10-10 04:57:27,081 ActionQueue.py:238 - Executing command with id = 68-0, taskId = 307 for role = SPARK2_JOBHISTORYSERVER of cluster vqcluster.
INFO 2017-10-10 04:57:27,081 ActionQueue.py:279 - Command execution metadata - taskId = 307, retry enabled = False, max retry duration (sec) = 0, log_output = True
WARNING 2017-10-10 04:57:27,083 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-307.txt'
INFO 2017-10-10 04:57:32,563 PythonExecutor.py:130 - Command ['/usr/bin/python',
 u'/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/job_history_server.py',
 u'START',
 '/var/lib/ambari-agent/data/command-307.json',
 u'/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package',
 '/var/lib/ambari-agent/data/structured-out-307.json',
 'INFO',
 '/var/lib/ambari-agent/tmp',
 'PROTOCOL_TLSv1',
 ''] failed with exitcode=1
INFO 2017-10-10 04:57:32,577 log_process_information.py:40 - Command 'export COLUMNS=9999 ; ps faux' returned 0. USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1 ACCEPTED SOLUTION

avatar
Master Mentor

@ilia kheifets

Sorry to hear you are encountering all these problems. Could you tell me the

HDP,Ambari and OS type and version you are trying to install.

I will try to guide you.

View solution in original post

32 REPLIES 32

avatar
Master Mentor

@ilia kheifets

Are you running ambari server as "root" or non root user?

Do you have proper write permission (Read-Write) inside the following directory "/var/lib/ambari-agent/data"

?

WARNING 2017-10-1004:57:27,083CommandStatusDict.py:128-[Errno2] No such file or directory:'/var/lib/ambari-agent/data/output-307.txt'

.

avatar
Rising Star

i started the ambari agent with non root user

i fixed the permission issue

with

setfacl -m u:admin:rwx /var/lib/ambari-agent/data

still have the same error

INFO 2017-10-08 06:53:44,072 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster
INFO 2017-10-08 06:53:44,084 Controller.py:249 - Adding 1 commands. Heartbeat id = 3567
INFO 2017-10-08 06:53:44,085 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role HISTORYSERVER for service MAPREDUCE2 of cluster vqcluster to the queue.
INFO 2017-10-08 06:53:44,117 ActionQueue.py:238 - Executing command with id = 26-0, taskId = 100 for role = HISTORYSERVER of cluster vqcluster.
INFO 2017-10-08 06:53:44,117 ActionQueue.py:279 - Command execution metadata - taskId = 100, retry enabled = False, max retry duration (sec) = 0, log_output = True
INFO 2017-10-08 06:53:44,931 PythonExecutor.py:130 - Command ['/usr/bin/python',
 u'/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py',
 u'START',
 '/var/lib/ambari-agent/data/command-100.json',
 u'/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package',
 '/var/lib/ambari-agent/data/structured-out-100.json',
 'INFO',
 '/var/lib/ambari-agent/tmp',
 'PROTOCOL_TLSv1',
 ''] failed with exitcode=1


<br>

avatar
Super Guru

Hi @ilia kheifets,

Did you do the sudoer configuration for Ambari agents. If not please follow the doc and perform the steps, restart ambari agents and try to start the history server.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-security/content/how_to_configure...

Thanks,

Aditya

avatar
Rising Star

HI, i tried it and indeed had to change the user under "run_as_user" but the error still exist

i also tried to do a new setup as root user (on a clean install and still have the same issue)

when i tried to run the command manually:

/usr/bin/python /var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py START /var/lib/ambari-agent/data/command-100.json /var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package /var/lib/ambari-agent/data/structured-out-100.json INFO /var/lib/ambari-agent/tmp PROTOCOL_TLSv1

i receive the error

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py", line 22, in <module>
    from resource_management.libraries.script.script import Script

avatar
Master Mentor

@ilia kheifets

Sorry to hear you are encountering all these problems. Could you tell me the

HDP,Ambari and OS type and version you are trying to install.

I will try to guide you.

avatar
Rising Star

ubuntu 16.4 based on

http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.list

and ubuntu 14.4 based on

http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.2.0/ambari.list

HDP-2.6.2.0
HDFS2.7.3
YARN2.7.3
MapReduce22.7.3
Tez0.7.0
Hive1.2.1000
Pig0.16.0
ZooKeeper3.4.6
Ambari Metrics0.1.0
SmartSense1.4.2.2.5.2.0-298
Spark22.1.1
Slider0.92.0

avatar
Master Mentor

@ilia kheifets

What is your cluster size? what error especially are you experiencing?

Once you respond, I never experience such a problem n Ubuntu, will build a single node cluster and try to reproduce your error.


avatar
Rising Star

for the test case i used a total of 3 computers

  1. ambari-server to manage all nodes
  2. amabri-agent to hold all managing services
  3. ambari-agent only worker services

the error mentioned in the first post, upon starting all services on all nodes, the History Server ,Hive Metastore ,HiveServer2 faill to start.

I made a simple step by step install, with no special configuration on fresh vm, based on clean images from http://www.osboxes.org

avatar
Master Mentor

@ilia kheifets

Curious though did you install ambari-agent on the ambari server too? If you didn't please do that and edit /etc/ambari-agent/conf/ambari-agent.ini to point to the new host.

[server]
hostname={Ambari-server_FQDN}
url_port=8440
secured_url_port=8441

Please le me know before I build an environment to reproduce your issue!