Support Questions

Find answers, ask questions, and share your expertise

[TEZ] where are stored intermediates result ?

avatar

Hi,

I know that Tez avoids storing intermediates result into HDFS (versus MapReduce that does it) but I was wondering, where are they stored then ?

I read : "on memory", "on local disk"...

But what if the task which emits intermediates result are not on the same node that the task which will receive it ?

So, is it just network I/O instead of HDFS read / write streaming datas from memory and/or local disk ?

Thanks for your help 🙂

1 ACCEPTED SOLUTION

avatar
Contributor

There is some information here about data movement: https://hortonworks.com/blog/writing-a-tez-inputprocessoroutput-2/

Tez is pluggable and has different data transfer paradigms, but in general things are kept in memory until size constraints cause flushes to local disk. When tasks are not on the same node data will be transferred over the network (out of band data movement events involving the AM + direct data transfer between the nodes.)

View solution in original post

2 REPLIES 2

avatar
Contributor

There is some information here about data movement: https://hortonworks.com/blog/writing-a-tez-inputprocessoroutput-2/

Tez is pluggable and has different data transfer paradigms, but in general things are kept in memory until size constraints cause flushes to local disk. When tasks are not on the same node data will be transferred over the network (out of band data movement events involving the AM + direct data transfer between the nodes.)

avatar

Hi @Gunther Hagleitner ; thanks it's very clear with your explainations.