Created 02-14-2017 09:25 AM
What are the differences between mr mode and tez.
Why tez functions faster than mr?
why few queries fail in tez but gets executed successfully in mr mode?
I know the question I ask have multiple answers but I just wanted to know all the possible scenarios. Thanks
Created 02-14-2017 03:41 PM
Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:
1. Read data from file -->one disk access
2. Run mappers
3. Write map output --> second disk access
4. Run shuffle and sort --> read map output, third disk access
5. write shuffle and sort --> write sorted data for reducers --> fourth disk access
6. Run reducers which reads sorted data --> fifth disk output
7. Write reducers output -->sixth disk access
Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):
1. Execute the plan but no need to read data from disk.
2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.
Only one read and one write.
Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.
Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.
I highly recommend skimming quickly over following slides, specially starting from slide 7.
Created 02-14-2017 02:57 PM
Please, have a look at the below links. Hopefully, there will give you some more details:
Created 02-14-2017 03:31 PM
Here are some links that you can go though to understand differences between MR and Tez.
Difference between MR and Tez & Why Tez is faster?
http://hortonworks.com/apache/tez/#section_2
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez
http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey (See slides 9 to 16)
why job fails in tez but runs in MR?
There could be several reasons for the jobs failure. It could be because of OOM exceptions when memory is not tuned correctly (refer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determi.... I have also seen cases where job fails in MR but runs fine when Tez execution engine is used.
Created 02-14-2017 03:41 PM
Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:
1. Read data from file -->one disk access
2. Run mappers
3. Write map output --> second disk access
4. Run shuffle and sort --> read map output, third disk access
5. write shuffle and sort --> write sorted data for reducers --> fourth disk access
6. Run reducers which reads sorted data --> fifth disk output
7. Write reducers output -->sixth disk access
Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):
1. Execute the plan but no need to read data from disk.
2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.
Only one read and one write.
Notice the efficiency introduced by not going to disk multiple times. Intermediate results are stored in memory (not written to disks). On top of that there is vectorization (process batch of rows instead of one row at a time). All this adds to efficiencies in query time.
Now to answer your question on why Tez queries fail but executed in MR. This should not happen. Possible bugs or sometimes people working with Hive have used MapReduce for a while and know how to make things work but not as familiar with Tez. I think, Tez queries should not fail any more than Map Reduce.
I highly recommend skimming quickly over following slides, specially starting from slide 7.
Created 02-15-2017 08:18 AM
spot on @mqureshi. Thanks it helps !!
Created 02-16-2017 02:55 PM
Glad, it was helpful. If you think its a complete answer you were looking for, please accept the answer.
Created 07-28-2020 11:31 PM
"I highly recommend skimming quickly over following slides, specially starting from slide 7.
http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey"
This slide is not there at the path
Created 06-21-2017 10:07 PM
If the table is partitioned and there are delta files (from updates, for eg.), I think mr works but not tez. You may have to run compaction to convert the delta files into base files and then tez will work.