For example, I have an Hive query which implies a Map phase and a Reduce phase.
Is there a way to get the output file from the Map phase, before it is processed by the Reduce phase ?
That will allow me to understand who does what (and then, optimize the query)...
I'm not aware of anything like that. However, you might have more luck using explain to understand the individual vertices. With "hive.tez.exec.print.summary=true" you can see a summary of the number of records that flow between vertices. The Tez View has some visualizations of this data.