About ilia987

ilia987 · ‎06-19-2018

Hi, I am having some trouble settings the following scheduler queues params: have 2 queue Dev and Prod Root 100% Dev 30% Prod 70% (if only one used it should act as 100% of cluster) Each queue is used by multiple users and resources should be shared equally, but when only one user exists(in each queue) it should use the entire capacity of the queue. And if the user alone in the cluster it should use 100% of the cluster in case of second user join, the scheduler should share the available resources example flow: cluster is free of jobs user A submit job at queue Dev. (it now uses 100% of the cluster) user B submit job at queue Dev (it hangs in accepted) i want that the users will share the capacity of the cluster, each should receive 50%

ilia987 · ‎06-18-2018

it uses 300-1200mb, but you are right it cpu heavy. and i am trying to maximize the processing power

ilia987 · ‎06-18-2018

I have set it to 512M , it works. when tried to go lower for example 128 i have got an error: java.lang.IllegalArgumentException: System memory 119537664 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.

ilia987 · ‎06-18-2018

setting yarn.scheduler.minimum-allocation-mb to smaller size improved the allocated memory by 30%

ilia987 · ‎06-18-2018

We have the following server to act as workers 2*6 cores (24 threads) 64 gb ram based on ambari 2.6.1.5 our process uses approx 1gb, for example when i submit 100 workers with the settings: spark-submit ..... --executor-memory 2gb the total ram used us 302 (100*3), because the ram usage is 3 gb, i cant fully use all the computation power, 3*24 >60 (i set the limit to 60) what did i miss? both answers helped, each improved the ram usage

ilia987 · ‎10-18-2017

few minuets before i saw this post i just successfully solved the problem, i had two issues one i did not create hive db CREATE DATABASE hive; i base it on your post from https://community.hortonworks.com/answers/107905/view.html another issue i had i in the db url connection, i change it, to localhost. i am trying to accept your answer but i cant, don't have a button for it? next stage is to try it with non root install

ilia987 · ‎10-18-2017

before install i enterg the error: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 211, in <module> HiveMetastore().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 61, in start create_metastore_schema() File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 382, in create_metastore_schema user = params.hive_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-server2-hive2/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED] -verbose' returned 1. SLF4J: Class path contains multiple SLF4J bindings. can you provide detail information of hive requirements installation, including mysql\pgsql simple configuration

ilia987 · ‎10-17-2017

i tried multiple setups the only one that worked(parietal) when i installed all available services with Metastore issue (but spark2 worked i have successfully added more nodes and tested with sparksubmit ) that why i might have some mixture with sql and pgsql, i tried any combination i could think of then i tried to install only spark2 with its dependencies using the following steps: apt-get install ntp -y update-rc.d ntp defaults sudo ufw disable apt install selinux-utils -y setenforce 0 umask 0022 echo umask 0022 >> /etc/profile wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu16/2.x/updates/2.5.2.0/ambari.list apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD apt-get update apt-get install ambari-server -y echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag ambari-server setup -s ambari-server start in the web ui i selected only spark2 service and then clicked yes on all its dependencies this give me an error on hive-metastore start stage

ilia987 · ‎10-17-2017

the url is current, (i have changed it in the comment but the url is current. cause all other systems are working) do i have do make some special configuraion to the sql db ? i have done: ambari-server setup -s sudo apt-get update sudo apt-get install mysql-server -y sudo mysql_secure_installation apt-get install libpostgresql-jdbc-java -y apt-get install libmysql-java -y ls /usr/share/java/mysql-connector-java.jar ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar and then i tried to make the cluster via the web ui, all goes with no error until starting hive(last stage)

ilia987 · ‎10-17-2017

i have installed all services on one node and had few issue: after finishing the install stage Hive Metastore fail to start with same issue as before: INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:04,537 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:04,538 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:09,219 ClusterConfiguration.py:119 - Updating cached configurations for cluster vqcluster INFO 2017-10-17 01:55:09,252 RecoveryManager.py:717 - Received EXECUTION_COMMAND (START), desired state of HIVE_METASTORE to STARTED INFO 2017-10-17 01:55:09,253 Controller.py:249 - Adding 1 commands. Heartbeat id = 62015 INFO 2017-10-17 01:55:09,253 ActionQueue.py:113 - Adding EXECUTION_COMMAND for role HIVE_METASTORE for service HIVE of cluster vqcluster to the queue. INFO 2017-10-17 01:55:09,288 ActionQueue.py:238 - Executing command with id = 43-0, taskId = 276 for role = HIVE_METASTORE of cluster vqcluster. INFO 2017-10-17 01:55:09,288 ActionQueue.py:279 - Command execution metadata - taskId = 276, retry enabled = False, max retry duration (sec) = 0, log_output = True INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:265 - Generating the JCEKS file: roleCommand=START and taskId = 276 INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:243 - Identifying config hive-site for CS: INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:288 - provider_path=jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks INFO 2017-10-17 01:55:09,289 CustomServiceOrchestrator.py:295 - ('/usr/jdk64/jdk1.8.0_112/bin/java', '-cp', '/var/lib/ambari-agent/cred/lib/*', 'org.apache.hadoop.security.alias.CredentialShell', 'create', u'javax.jdo.option.ConnectionPassword', '-value', [PROTECTED], '-provider', 'jceks://file/var/lib/ambari-agent/cred/conf/hive/hive-site.jceks') WARNING 2017-10-17 01:55:09,318 CommandStatusDict.py:128 - [Errno 2] No such file or directory: '/var/lib/ambari-agent/data/output-276.txt' INFO 2017-10-17 01:55:09,694 CustomServiceOrchestrator.py:297 - cmd_result = 0 INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK2_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - SPARK_THRIFTSERVER needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:14,801 RecoveryManager.py:255 - HIVE_METASTORE needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-10-17 01:55:15,375 PythonExecutor.py:130 - Command ['/usr/bin/python', u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py', u'START', '/var/lib/ambari-agent/data/command-276.json', u'/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package', '/var/lib/ambari-agent/data/structured-out-276.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1', ''] failed with exitcode=1 and Spark2 Thrift Server, Spark(1) Thrift Server can start but keep going down, ERROR 2017-10-17 01:57:36,029 script_alert.py:123 - [Alert][spark_thriftserver_status] Failed with result CRITICAL: ['Connection failed on host ambari-master.test.com:10015 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.1/package/scripts/alerts/alert_spark_thrift_port.py", line 143, in execute Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) ExecutionFailed: Execution of \'! beeline -u \'jdbc:hive2://ambari-master.test.com:10015/default\' transportMode=binary -e \'\' 2>&1| awk \'{print}\'|grep -i -e \'Connection refused\' -e \'Invalid URL\'\' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) Error: Could not open client transport with JDBC Uri: jdbc:hive2://ambari-master.test.com:10015/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) )'] our main requirement is spark2 so ill try to make another clean install with only spark2 and its dependencies hoping not to have any more issues

Online	Offline
Last Visited	‎08-01-2022 04:18 AM

Member Since	‎09-28-2017 09:04 AM
Last Visited	‎08-01-2022 04:18 AM
Posts	88
Kudos received	3

Cloudera Community

Yarn Capacity Scheduler: Share resource between us...

Re: worker uses more ram than it should

Re: worker uses more ram than it should

Re: worker uses more ram than it should

worker uses more ram than it should

Re: ambari cluster not working, error in history s...

Re: ambari cluster not working, error in history s...

Re: ambari cluster not working, error in history s...

Re: ambari cluster not working, error in history s...

Re: ambari cluster not working, error in history s...