Support Questions

Find answers, ask questions, and share your expertise

ambari cluster not working, error in history server

avatar
Rising Star

im based on http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.listr

selected spark2 and all its required dependencies

the following services have an error:

  • History Server - Connection failed: [Errno 111] Connection refused to ambari-agent1
  • Hive Metastore
  • HiveServer2

i receive the following error on manual starting history server

INFO 2017-10-10 04:57:10,565 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-10-10 04:57:10,565 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2017-10-10 04:57:10,681 Hardware.py:176 - Some mount points were ignored: /dev, /run, /, /dev/shm, /run/lock, /sys/fs/cgroup, /boot, /home, /run/user/108, /run/user/1007, /run/user/1005, /run/user/1010, /run/user/1011, /run/user/1012, /run/user/1001
INFO 2017-10-10 04:57:10,682 Controller.py:320 - Sending Heartbeat (id = 4066)
INFO 2017-10-10 04:57:10,688 Controller.py:333 - Heartbeat response received (id = 4067)
INFO 2017-10-10 04:57:10,688 Controller.py:342 - Heartbeat interval is 1 seconds
INFO 2017-10-10 04:57:10,688 Controller.py:380 - Updating configurations from heartbeat
INFO 2017-10-10 04:57:10,688 Controller.py:389 - Adding cancel/execution commands
INFO 2017-10-10 04:57:10,688 Controller.py:475 - Waiting 0.9 for next heartbeat
INFO 2017-10-10 04:57:11,589 Controller.py:482 - Wait for next heartbeat over
WARNING 2017-10-10 04:57:22,205 base_alert.py:138 - [Alert][namenode_hdfs_capacity_utilization] Unable to execute alert. division by zero
INFO 2017-10-10 04:57:27,060 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster
INFO 2017-10-10 04:57:27,071 Controller.py:249 - Adding 1 commands. Heartbeat id = 4085
INFO 2017-10-10 04:57:27,071 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role SPARK2_JOBHISTORYSERVER for service SPARK2 of cluster vqcluster to the queue.
INFO 2017-10-10 04:57:27,081 ActionQueue.py:238 - Executing command with id = 68-0, taskId = 307 for role = SPARK2_JOBHISTORYSERVER of cluster vqcluster.
INFO 2017-10-10 04:57:27,081 ActionQueue.py:279 - Command execution metadata - taskId = 307, retry enabled = False, max retry duration (sec) = 0, log_output = True
WARNING 2017-10-10 04:57:27,083 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-307.txt'
INFO 2017-10-10 04:57:32,563 PythonExecutor.py:130 - Command ['/usr/bin/python',
 u'/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/job_history_server.py',
 u'START',
 '/var/lib/ambari-agent/data/command-307.json',
 u'/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package',
 '/var/lib/ambari-agent/data/structured-out-307.json',
 'INFO',
 '/var/lib/ambari-agent/tmp',
 'PROTOCOL_TLSv1',
 ''] failed with exitcode=1
INFO 2017-10-10 04:57:32,577 log_process_information.py:40 - Command 'export COLUMNS=9999 ; ps faux' returned 0. USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1 ACCEPTED SOLUTION

avatar
Master Mentor

@ilia kheifets

Sorry to hear you are encountering all these problems. Could you tell me the

HDP,Ambari and OS type and version you are trying to install.

I will try to guide you.

View solution in original post

32 REPLIES 32

avatar
Master Mentor

@ilia kheifets

Here we go all the services up and running without major issues !!!

Did you install java with JCE? standard procedure? I have seen your comments but at times we ignore the obvious well-documented steps. I followed usual steps and all is up and running.

As I have demonstrated that the procedure is OK and valid, this answered your question.


all-service-up.jpg

avatar
Rising Star

Ill try again by your step exactly. and hope for the best

avatar
Master Mentor

@ilia kheifets

I am positive it will work! And once it does don't forget to accept my answer. That way other HCC users can quickly find the solution when they encounter the same issue.

Please let me know

avatar
Rising Star

i have installed all services on one node and had few issue:

40878-error1.png

after finishing the install stage

Hive Metastore fail to start with same issue as before:

INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:04,538 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:09,219 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster
INFO 2017-10-17 01:55:09,252 RecoveryManager.py:717 - Received EXECUTION_COMMAND (START), desired state of HIVE_METASTORE to STARTED
INFO 2017-10-17 01:55:09,253 Controller.py:249 - Adding 1 commands. Heartbeat id = 62015
INFO 2017-10-17 01:55:09,253 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role HIVE_METASTORE for service HIVE of cluster vqcluster to the queue.
INFO 2017-10-17 01:55:09,288 ActionQueue.py:238 - Executing command with id = 43-0, taskId = 276 for role = HIVE_METASTORE of cluster vqcluster.
INFO 2017-10-17 01:55:09,288 ActionQueue.py:279 - Command execution metadata - taskId = 276, retry enabled = False, max retry duration (sec) = 0, log_output = True
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:265 - Generating the JCEKS file: roleCommand=START and taskId = 276
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:243 - Identifying config hive-site for CS: 
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:288 - provider_path=jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:295 - ('/usr/jdk64/jdk1.8.0_112/bin/java', '-cp', '/var/lib/ambari-agent/cred/lib/*', 'org.apache.hadoop.security.alias.CredentialShell', 'create', u'javax.jdo.option.ConnectionPassword', '-value', [PROTECTED], '-provider', 'jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks')
WARNING 2017-10-17 01:55:09,318 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-276.txt'
INFO 2017-10-17 01:55:09,694 CustomServiceOrchestrator.py:297 - cmd_result = 0
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:15,375 PythonExecutor.py:130 - Command ['/usr/bin/python',
 u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py',
 u'START',
 '/var/lib/ambari-agent/data/command-276.json',
 u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package',
 '/var/lib/ambari-agent/data/structured-out-276.json',
 'INFO',
 '/var/lib/ambari-agent/tmp',
 'PROTOCOL_TLSv1',
 ''] failed with exitcode=1

and Spark2 Thrift Server, Spark(1) Thrift Server can start but keep going down,

ERROR 2017-10-17 01:57:36,029 script_alert.py:123 - [Alert][spark_thriftserver_status] Failed with result CRITICAL: ['Connection failed on host ambari-master.test.com:10015 (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute
    Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
ExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://ambari-master.test.com:10015/default\' transportMode=binary  -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
)']

our main requirement is spark2 so ill try to make another clean install with only spark2 and its dependencies hoping not to have any more issues

avatar
Master Mentor

@ilia kheifets

Hive needs a metastore database to store structure information of the various tables and partitions in the warehouse.

Oozie stores the workflow/scheduler details in the relational database

You could use postgres instead of Mysql and the creation is even easier see below

Hive database

In the below example the database and user is "hive"

# su - postgres
postgres@ubuntu17:~$ psql
psql (9.5.9)
Type "help" for help.
postgres=# DROP DATABASE if exists hive;
postgres=# CREATE USER hive PASSWORD 'hive';
postgres=# CREATE DATABASE hive OWNER hive;
postgres=# grant all privileges on database hive to hive;
postgres=# \q 

Oozie database

In the below example the database and user is "oozie"

postgres=# DROP DATABASE if exists oozie; 
postgres=# CREATE USER oozie PASSWORD 'oozie'; 
postgres=# CREATE DATABASE oozie OWNER oozie; 
postgres=# grant all privileges on database oozie to oozie;
postgres=# \q

After the above has been successful during the Ambari UI setup use the hive and oozie info to set up these components. Can you let me know if the installation succeeded

avatar
Rising Star

the installation partialy succeeded, i have a large response waiting for moderation(probably due to size or attached image)

ill ask the question again, why setting ambari-server setup -s is not enough, why it is required to configure the sql manually

avatar
Master Mentor

@ilia kheifets

Is this the correct URL to you hive database? jdbc:hive2://ambari-master.test.com:10015/default

I see error "Connection refused\' -e \'Invalid URL\'\' returned 1. Error"

Can you walk me through the setup of the databases for hive,oozie? Was it with Mysql or Postgres?

ambari-server setup -s ( -s silent install) should work with the embedded postgres, so I can tell why it didn't work in your case could one of the standard OS preparations that you ignored.

avatar
Rising Star

the url is current, (i have changed it in the comment but the url is current. cause all other systems are working)

do i have do make some special configuraion to the sql db ?

i have done:

ambari-server setup -s
sudo apt-get update 
sudo apt-get install mysql-server -y
sudo mysql_secure_installation
apt-get install libpostgresql-jdbc-java -y
apt-get install libmysql-java -y
ls /usr/share/java/mysql-connector-java.jar
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

and then i tried to make the cluster via the web ui, all goes with no error until starting hive(last stage)

avatar
Master Mentor

@ilia kheifets

Can you enlighten me I see a mixture of Mysql and postgres command in your new posting. We can resolve your issue with ONLY postgres installation because the mixture looks confusing.

It wont affect any service except Hive and oozie or Ranger iif you intend to and Ranger for authorization, authentication and administration of security policies

avatar
Rising Star

i tried multiple setups the only one that worked(parietal) when i installed all available services with Metastore issue (but spark2 worked i have successfully added more nodes and tested with sparksubmit )

that why i might have some mixture with sql and pgsql, i tried any combination i could think of

then i tried to install only spark2 with its dependencies using the following steps:

apt-get install ntp -y
update-rc.d ntp defaults
sudo ufw disable
apt install selinux-utils -y
setenforce 0
umask 0022
echo umask 0022 >> /etc/profile
wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update
apt-get install ambari-server -y
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag
ambari-server setup -s ambari-server start

in the web ui i selected only spark2 service and then clicked yes on all its dependencies

this give me an error on hive-metastore start stage