Created 03-16-2017 08:45 AM
We are doing some analysis on MR vs TEZ. TEZ is doing better than MR on small and mild data volumes but MR is beating TEZ on large volumes, We have seen it multiple times on different test beds. Please suggest
Created 03-30-2017 05:23 AM
1)Please define actual size and performance numbers that you encountered.
Ans.
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|||
|
|
|
|
|
|
|
|||
|
|
|||
|
|
|
|
|
|
|
|||
|
|
|||
|
|
2)Clarify what test beds you are referring and how did you use them?
Ans. In above statistics table:
In Operation 1 is a creating lateral view on a small data set.
In Operation 2 is joining 3 tables of intermediate data volume.
In Operation 3 is joining 4 tables of large data volume in inner query and aggregation happening on top of that.
3)Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.
Created 03-17-2017 01:54 AM
1) Please define actual size and performance numbers that you encountered.
2) Clarify what test beds you are referring and how did you use them?
3) Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.
After clarifying all the above, we can state that driving a bike is sometimes faster than driving a Ferrari. That may be because the bike is better suited for niche cases where there is a little space for a car to go through (narrow roads, etc). I would not generalize that easy. I am not sure about anything stated as "is always better". There is always an exception. Anyhow, you can set the desired engine the session level, if you wish to use MR or Tez. Thus, for cases where MR performs better, use it. It is not like you have to code it when you execute a Hive query.
Created 03-17-2017 02:12 AM
Great analogy, but I only have a bike! 🙂 I'd like to be able to say "set my.transport.engine=ferrari;" and it here it is, at my front door!
Created 03-30-2017 05:23 AM
1)Please define actual size and performance numbers that you encountered.
Ans.
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|||
|
|
|
|
|
|
|
|||
|
|
|||
|
|
|
|
|
|
|
|||
|
|
|||
|
|
2)Clarify what test beds you are referring and how did you use them?
Ans. In above statistics table:
In Operation 1 is a creating lateral view on a small data set.
In Operation 2 is joining 3 tables of intermediate data volume.
In Operation 3 is joining 4 tables of large data volume in inner query and aggregation happening on top of that.
3)Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.
Created 04-01-2017 04:23 PM
@Constantin Stanca Any thoughts on this?