Support Questions

vipulksaath · ‎03-16-2017

We are doing some analysis on MR vs TEZ. TEZ is doing better than MR on small and mild data volumes but MR is beating TEZ on large volumes, We have seen it multiple times on different test beds. Please suggest

vipulksaath · ‎03-30-2017

1)Please define actual size and performance numbers that you encountered.

Ans.

Data Volume	Time elapsed for TEZ	Average Time MR	Time elapsed for MR	Average Time for TEZ
1900 records	46.350 secs	41.626 secs	63.666 secs	56.176 secs
40.341 secs	55.633 secs
38.189 secs	49.230 secs
91914 records	32.049 secs	32.097 secs	52.920 secs	51.236 secs
32.088 secs	49.030 secs
32.156 secs	51.760 secs
993168 records	850.01 secs	861.781 secs	611.625 secs	635.781 secs
865.230 secs	691.751 secs
872.110 secs	672.285 secs
868.995 secs	567.466 secs

2)Clarify what test beds you are referring and how did you use them?

Ans. In above statistics table:

In Operation 1 is a creating lateral view on a small data set.

In Operation 2 is joining 3 tables of intermediate data volume.

In Operation 3 is joining 4 tables of large data volume in inner query and aggregation happening on top of that.

3)Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.

1.Ans. Above jobs ran in parallel i.e. 10 jobs in parallel on TEZ mode and 10 jobs in parallel on MR mode.

2.Above results are output of multiple test iterations and performed on different test beds.

View solution in original post

cstanca · ‎03-17-2017

@Vipul Choudhary

1) Please define actual size and performance numbers that you encountered.

2) Clarify what test beds you are referring and how did you use them?

3) Clarify what is the type of test case you execute? It is important to clarify because some tests can be disk I/O intensive, others can be memory intensive.

After clarifying all the above, we can state that driving a bike is sometimes faster than driving a Ferrari. That may be because the bike is better suited for niche cases where there is a little space for a car to go through (narrow roads, etc). I would not generalize that easy. I am not sure about anything stated as "is always better". There is always an exception. Anyhow, you can set the desired engine the session level, if you wish to use MR or Tez. Thus, for cases where MR performs better, use it. It is not like you have to code it when you execute a Hive query.

pminovic · ‎03-17-2017

Great analogy, but I only have a bike! 🙂 I'd like to be able to say "set my.transport.engine=ferrari;" and it here it is, at my front door!

vipulksaath · ‎03-30-2017