Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Difference between mr and Tez?

Solved Go to solution
Highlighted

Difference between mr and Tez?

What are the differences between mr mode and tez.

Why tez functions faster than mr?

why few queries fail in tez but gets executed successfully in mr mode?

I know the question I ask have multiple answers but I just wanted to know all the possible scenarios. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Difference between mr and Tez?

Super Guru

@Bala Vignesh N V

Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

1. Read data from file -->one disk access

2. Run mappers

3. Write map output --> second disk access

4. Run shuffle and sort --> read map output, third disk access

5. write shuffle and sort --> write sorted data for reducers --> fourth disk access

6. Run reducers which reads sorted data --> fifth disk output

7. Write reducers output -->sixth disk access

Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

1. Execute the plan but no need to read data from disk.

2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

Only one read and one write.

Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.

Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.

I highly recommend skimming quickly over following slides, specially starting from slide 7.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey

6 REPLIES 6

Re: Difference between mr and Tez?

Re: Difference between mr and Tez?

New Contributor

Hi @Bala Vignesh N V

Here are some links that you can go though to understand differences between MR and Tez.

Difference between MR and Tez & Why Tez is faster?

http://hortonworks.com/apache/tez/#section_2

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey (See slides 9 to 16)

why job fails in tez but runs in MR?

There could be several reasons for the jobs failure. It could be because of OOM exceptions when memory is not tuned correctly (refer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determi.... I have also seen cases where job fails in MR but runs fine when Tez execution engine is used.

Re: Difference between mr and Tez?

Super Guru

@Bala Vignesh N V

Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

1. Read data from file -->one disk access

2. Run mappers

3. Write map output --> second disk access

4. Run shuffle and sort --> read map output, third disk access

5. write shuffle and sort --> write sorted data for reducers --> fourth disk access

6. Run reducers which reads sorted data --> fifth disk output

7. Write reducers output -->sixth disk access

Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

1. Execute the plan but no need to read data from disk.

2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

Only one read and one write.

Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.

Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.

I highly recommend skimming quickly over following slides, specially starting from slide 7.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey

Re: Difference between mr and Tez?

spot on @mqureshi. Thanks it helps !!

Re: Difference between mr and Tez?

Super Guru

@Bala Vignesh N V

Glad, it was helpful. If you think its a complete answer you were looking for, please accept the answer.

Re: Difference between mr and Tez?

New Contributor

If the table is partitioned and there are delta files (from updates, for eg.), I think mr works but not tez. You may have to run compaction to convert the delta files into base files and then tez will work.

Don't have an account?
Coming from Hortonworks? Activate your account here