Member since
09-28-2017
88
Posts
3
Kudos Received
0
Solutions
06-19-2018
07:25 AM
Hi, I am having some trouble settings the following scheduler queues params: have 2 queue Dev and Prod Root 100% Dev 30% Prod 70% (if only one used it should act as 100% of cluster) Each queue is used by multiple users and resources should be shared equally, but when only one user exists(in each queue) it should use the entire capacity of the queue.
And if the user alone in the cluster it should use 100% of the cluster
in case of second user join, the scheduler should share the available resources example flow:
cluster is free of jobs user A submit job at queue Dev. (it now uses 100% of the cluster) user B submit job at queue Dev (it hangs in accepted) i want that the users will share the capacity of the cluster, each should receive 50%
... View more
Labels:
- Labels:
-
Apache YARN
-
Cloudera Manager
06-18-2018
02:25 PM
it uses 300-1200mb, but you are right it cpu heavy. and i am trying to maximize the processing power
... View more
06-18-2018
01:58 PM
I have set it to 512M , it works. when tried to go lower for example 128 i have got an error: java.lang.IllegalArgumentException: System memory 119537664 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
... View more
06-18-2018
01:46 PM
setting yarn.scheduler.minimum-allocation-mb to smaller size improved the allocated memory by 30%
... View more
06-18-2018
12:50 PM
We have the following server to act as workers
2*6 cores (24 threads)
64 gb ram
based on ambari 2.6.1.5
our process uses approx 1gb, for example when i submit 100 workers with the settings:
spark-submit ..... --executor-memory 2gb
the total ram used us 302 (100*3), because the ram usage is 3 gb, i cant fully use all the computation power, 3*24 >60 (i set the limit to 60) what did i miss? both answers helped, each improved the ram usage
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Manager
10-18-2017
07:39 AM
few minuets before i saw this post i just successfully solved the problem, i had two issues
one i did not create hive db CREATE DATABASE hive;
i base it on your post from https://community.hortonworks.com/answers/107905/view.html
another issue i had i in the db url connection,
i change it, to localhost. i am trying to accept your answer but i cant, don't have a button for it?
next stage is to try it with non root install
... View more
10-18-2017
06:20 AM
before install i enterg the error: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 211, in <module>
HiveMetastore().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 61, in start
create_metastore_schema()
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 382, in create_metastore_schema
user = params.hive_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-server2-hive2/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED] -verbose' returned 1. SLF4J: Class path contains multiple SLF4J bindings. can you provide detail information of hive requirements installation, including mysql\pgsql simple configuration
... View more
10-17-2017
03:01 PM
i tried multiple setups the only one that worked(parietal) when i installed all available services with Metastore issue (but spark2 worked i have successfully added more nodes and tested with sparksubmit ) that why i might have some mixture with sql and pgsql, i tried any combination i could think of then i tried to install only spark2 with its dependencies using the following steps: apt-get install ntp -y update-rc.d ntp defaults sudo ufw disable apt install selinux-utils -y setenforce 0 umask 0022 echo umask 0022 >> /etc/profile wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.list apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD apt-get update apt-get install ambari-server -y echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag ambari-server setup -s
ambari-server start in the web ui i selected only spark2 service and then clicked yes on all its dependencies this give me an error on hive-metastore start stage
... View more
10-17-2017
02:44 PM
the url is current, (i have changed it in the comment but the url is current. cause all other systems are working) do i have do make some special configuraion to the sql db ?
i have done: ambari-server setup -s
sudo apt-get update sudo apt-get install mysql-server -y sudo mysql_secure_installation apt-get install libpostgresql-jdbc-java -y apt-get install libmysql-java -y ls /usr/share/java/mysql-connector-java.jar ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar and then i tried to make the cluster via the web ui, all goes with no error until starting hive(last stage)
... View more
10-17-2017
12:26 PM
i have installed all services on one node and had few issue: after finishing the install stage Hive Metastore fail to start with same issue as before: INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:04,538 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:09,219 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster
INFO 2017-10-17 01:55:09,252 RecoveryManager.py:717 - Received EXECUTION_COMMAND (START), desired state of HIVE_METASTORE to STARTED
INFO 2017-10-17 01:55:09,253 Controller.py:249 - Adding 1 commands. Heartbeat id = 62015
INFO 2017-10-17 01:55:09,253 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role HIVE_METASTORE for service HIVE of cluster vqcluster to the queue.
INFO 2017-10-17 01:55:09,288 ActionQueue.py:238 - Executing command with id = 43-0, taskId = 276 for role = HIVE_METASTORE of cluster vqcluster.
INFO 2017-10-17 01:55:09,288 ActionQueue.py:279 - Command execution metadata - taskId = 276, retry enabled = False, max retry duration (sec) = 0, log_output = True
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:265 - Generating the JCEKS file: roleCommand=START and taskId = 276
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:243 - Identifying config hive-site for CS:
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:288 - provider_path=jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks
INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:295 - ('/usr/jdk64/jdk1.8.0_112/bin/java', '-cp', '/var/lib/ambari-agent/cred/lib/*', 'org.apache.hadoop.security.alias.CredentialShell', 'create', u'javax.jdo.option.ConnectionPassword', '-value', [PROTECTED], '-provider', 'jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks')
WARNING 2017-10-17 01:55:09,318 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-276.txt'
INFO 2017-10-17 01:55:09,694 CustomServiceOrchestrator.py:297 - cmd_result = 0
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2017-10-17 01:55:15,375 PythonExecutor.py:130 - Command ['/usr/bin/python',
u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py',
u'START',
'/var/lib/ambari-agent/data/command-276.json',
u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package',
'/var/lib/ambari-agent/data/structured-out-276.json',
'INFO',
'/var/lib/ambari-agent/tmp',
'PROTOCOL_TLSv1',
''] failed with exitcode=1 and Spark2 Thrift Server, Spark(1) Thrift Server can start but keep going down, ERROR 2017-10-17 01:57:36,029 script_alert.py:123 - [Alert][spark_thriftserver_status] Failed with result CRITICAL: ['Connection failed on host ambari-master.test.com:10015 (Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute
Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
ExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://ambari-master.test.com:10015/default\' transportMode=binary -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
)']
our main requirement is spark2 so ill try to make another clean install with only spark2 and its dependencies hoping not to have any more issues
... View more