Member since
12-15-2015
2
Posts
1
Kudos Received
0
Solutions
12-15-2015
09:35 PM
1 Kudo
I found that property here. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html. I used the bootstrap script provided by amazon. After logging on the cluster, I realized I am using one of the oldest version of Tez. Like 0.4. I still have to try with the latest version but was thinking if that would give that much performance boost.
... View more
12-15-2015
09:04 PM
We have started to look into testing tez query engine. From initial results, we are getting 30% performance boost over Hive on smaller data set(1-10 GB) but Hive starts to perform better than Tez as data size increases. Like when we run a hive query with Tez on about 2.3 TB worth of data, it performs worse than hive alone.(~20% less performance) Details are in the post below. On a cluster with 1.3 TB RAM, I set the following property : set tez.task.resource.memory.mb=10000;
set tez.am.resource.memory.mb=59205;
set tez.am.launch.cmd-opts =-Xmx47364m;
set hive.tez.container.size=59205;
set hive.tez.java.opts=-Xmx47364m;
set tez.am.grouping.max-size=36700160000; Is it normal or I am missing some property / not configuring some property properly? Also, I am using an older version of Tez as of now. Could that be the issue too? I still to bootstrap latest version of Tez on EMR and test it and see if that could do any better http://www.jwplayer.com/blog/hive-with-tez-on-emr/
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Tez