Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[TEZ] where are stored intermediates result ?

Solved Go to solution

[TEZ] where are stored intermediates result ?

New Contributor

Hi,

I know that Tez avoids storing intermediates result into HDFS (versus MapReduce that does it) but I was wondering, where are they stored then ?

I read : "on memory", "on local disk"...

But what if the task which emits intermediates result are not on the same node that the task which will receive it ?

So, is it just network I/O instead of HDFS read / write streaming datas from memory and/or local disk ?

Thanks for your help :)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: [TEZ] where are stored intermediates result ?

New Contributor

There is some information here about data movement: https://hortonworks.com/blog/writing-a-tez-inputprocessoroutput-2/

Tez is pluggable and has different data transfer paradigms, but in general things are kept in memory until size constraints cause flushes to local disk. When tasks are not on the same node data will be transferred over the network (out of band data movement events involving the AM + direct data transfer between the nodes.)

2 REPLIES 2

Re: [TEZ] where are stored intermediates result ?

New Contributor

There is some information here about data movement: https://hortonworks.com/blog/writing-a-tez-inputprocessoroutput-2/

Tez is pluggable and has different data transfer paradigms, but in general things are kept in memory until size constraints cause flushes to local disk. When tasks are not on the same node data will be transferred over the network (out of band data movement events involving the AM + direct data transfer between the nodes.)

Highlighted

Re: [TEZ] where are stored intermediates result ?

New Contributor

Hi @Gunther Hagleitner ; thanks it's very clear with your explainations.