- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Why can't I run more than 1 query in parallel in Hive?
- Labels:
-
Apache Hive
Created on ‎07-03-2017 02:27 PM - edited ‎08-17-2019 05:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a small one node hdp2.6 cluster (8 CPUs, 32GB ram), and I cannot run more than 1 query at a time, although I was pretty sure that I configures the relevant settings to allow more than one container.
The relevant configs are:
yarn-site/yarn.nodemanager.resource.memory-mb = 27660 yarn-site/yarn.scheduler.minimum-allocation-mb = 5532 yarn-site/yarn.scheduler.maximum-allocation-mb = 27660 mapred-site/mapreduce.map.memory.mb = 5532 mapred-site/mapreduce.reduce.memory.mb = 11064 mapred-site/mapreduce.map.java.opts = -Xmx4425m mapred-site/mapreduce.reduce.java.opts = -Xmx8851m mapred-site/yarn.app.mapreduce.am.resource.mb = 11059 mapred-site/yarn.app.mapreduce.am.command-opts = -Xmx8851m -Dhdp.version=${hdp.version} hive-site/hive.execution.engine = tez hive-site/hive.tez.container.size = 5532 hive-site/hive.auto.convert.join.noconditionaltask.size = 1546859315 tez-site/tez.runtime.unordered.output.buffer.size-mb = 414 tez-interactive-site/tez.am.resource.memory.mb = 5532 tez-site/tez.am.resource.memory.mb = 5532 tez-site/tez.task.resource.memory.mb = 5532 tez-site/tez.runtime.io.sort.mb = 1351 hive-site/hive.tez.java.opts = -server -Xmx4425m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps capacity-scheduler/yarn.scheduler.capacity.resource-calculator = org.apache.hadoop.yarn.util.resource.DominantResourceCalculatororg.apache.hadoop.yarn.util.resource.DominantResourceCalculator yarn-site/yarn.nodemanager.resource.cpu-vcores = 6 yarn-site/yarn.scheduler.maximum-allocation-vcores = 6 mapred-site/mapreduce.map.output.compress = true hive-site/hive.exec.compress.intermediate = true hive-site/hive.exec.compress.output = true hive-interactive-env/enable_hive_interactive = false
Which if I understand it well, gives 5GB per container.
If I run a hive query, it will use 5GB, 1 core, leaving about 15GB and 5 cores for the rest. I do not understand why the next query cannot start at the same time.
Any help would be much welcome.
Created ‎07-09-2020 04:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Mapreduce the Reducer output would wait after all ten Mapper is finished. We recommend to use Tez.
Created ‎07-09-2020 04:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you try running 2 queries simultaneously and check?
Created on ‎07-09-2020 04:08 AM - edited ‎07-09-2020 04:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, here is the screenshot:
query 1 (green) is started and query 2 (yellow) waits until all the jobs of query one are done
Created ‎07-09-2020 04:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have checked the screenshot. This is not application concurrency. Reducer phases1 is waiting for all the mappers to get finished. DAG is decided by the optimizer.
Are you using MAPREDUCE or Tez as an execution engine?
Created ‎07-09-2020 04:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose to run TEZ, because I found all the configurations for TEZ ...
But I'm bloody new in this ... So, supposedly I 'm irgnorant too.
Thanks for your fast responses!
Created ‎07-09-2020 04:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Mapreduce the Reducer output would wait after all ten Mapper is finished. We recommend to use Tez.
Created on ‎07-09-2020 04:31 AM - edited ‎07-09-2020 04:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my hive-config (Ambari) are plenty of tez-parameters - so I supposed it is TEZ. I did not found a parameter as 'use tez' or 'use mapreduce' ...
hive.convert.join.bucket.mapjoin.tez is False - may for this ?
My queries are running from beeline
Created ‎07-09-2020 05:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is decided by the optimiser.
Until Mappers are finished for that query,Reducers would not be stated.
Created ‎07-09-2020 05:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm searching now the correct information for the framework, where I found yarn and NOT yarn-tez in the mapred-site-xml ...
I'm totally new to this architecture, so I have to try - I did not find a docu apllicable to our installation (hdp 3.0.1 on powerpc) with ambari.
But thanks a lot, at least I understand, that we are NOT using TEZ ...
Created ‎07-09-2020 05:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the update. Tez fixes this kind of issue.
Created ‎07-09-2020 05:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately I'm still stick with the activation of tez under hive.
setting the properties:
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
adding in my beeline query
set hive.execution.engine=tez; ## in the query (now it is faster!)
still is always running only ONE of the two, still saying Starting task [Stage-1:MAPRED] in parallel
😞
