Archives of Support Questions (Read Only)

balavignesh_nag · ‎02-14-2017

What are the differences between mr mode and tez.

Why tez functions faster than mr?

why few queries fail in tez but gets executed successfully in mr mode?

I know the question I ask have multiple answers but I just wanted to know all the possible scenarios. Thanks

mqureshi · ‎02-14-2017

@Bala Vignesh N V

Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

1. Read data from file -->one disk access

2. Run mappers

3. Write map output --> second disk access

4. Run shuffle and sort --> read map output, third disk access

5. write shuffle and sort --> write sorted data for reducers --> fourth disk access

6. Run reducers which reads sorted data --> fifth disk output

7. Write reducers output -->sixth disk access

Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

1. Execute the plan but no need to read data from disk.

2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

Only one read and one write.

Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.

Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.

I highly recommend skimming quickly over following slides, specially starting from slide 7.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey

View solution in original post

dkozlowski · ‎02-14-2017

@Bala Vignesh N V

Please, have a look at the below links. Hopefully, there will give you some more details:

snukavarapu · ‎02-14-2017

Hi @Bala Vignesh N V

Here are some links that you can go though to understand differences between MR and Tez.

Difference between MR and Tez & Why Tez is faster?

http://hortonworks.com/apache/tez/#section_2

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey (See slides 9 to 16)

why job fails in tez but runs in MR?

There could be several reasons for the jobs failure. It could be because of OOM exceptions when memory is not tuned correctly (refer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determi.... I have also seen cases where job fails in MR but runs fine when Tez execution engine is used.

mqureshi · ‎02-14-2017

@Bala Vignesh N V

Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

1. Read data from file -->one disk access

2. Run mappers

3. Write map output --> second disk access

4. Run shuffle and sort --> read map output, third disk access

5. write shuffle and sort --> write sorted data for reducers --> fourth disk access

6. Run reducers which reads sorted data --> fifth disk output

7. Write reducers output -->sixth disk access

Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

1. Execute the plan but no need to read data from disk.

2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

Only one read and one write.

Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.

Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.

I highly recommend skimming quickly over following slides, specially starting from slide 7.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey

balavignesh_nag · ‎02-15-2017

spot on @mqureshi. Thanks it helps !!

mqureshi · ‎02-16-2017

@Bala Vignesh N V

Glad, it was helpful. If you think its a complete answer you were looking for, please accept the answer.

BharatTiwari9 · ‎07-28-2020

"I highly recommend skimming quickly over following slides, specially starting from slide 7.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey"

This slide is not there at the path

kerra · ‎06-21-2017

If the table is partitioned and there are delta files (from updates, for eg.), I think mr works but not tez. You may have to run compaction to convert the delta files into base files and then tez will work.

Cloudera Community

Archives of Support Questions (Read Only)

Difference between mr and Tez?