Automatic reassignment of query execution when one of the Impala node fails

I am running some tests. 


Submitted query to node2 in the cluster, brought down node 5, the query just hangs on node 2 for quite sometime. I understand if node2 goes down (node where the query is submitted), I need to resubmit the query. But if one of the other node goes down, do I still need to resubmit the query.


If so, any plans on enhancing this feature to auto detect and use another node to run that portion.


Pasting the relevant snippets from this thread below, including my reply:!topic/impala-user/ccHWO07WWEU





We sometimes got the error of impalad failure like this: "Couldn't open transport for xxx". And we must retry it from user-side. It wastes some computation and users don't like it.

I remembered Dremel supports fault tolerance and it can retry the failed sub task to another machine.
I'm not sure whether 'Fault tolerance' is in the roadmap. ( )
Do you have any information about this feature? And does it too hard to implement it?
Thanks very much.





Hi Zhe,

your suggested improvement certainly makes sense. We do plan to support fine-grained intra-query fault tolerance eventually, but we don't have any concrete release date in mind yet.
I don't expect the feature to be trivial, it's a rather fundamental change internally.
Please understand that we have to weigh this feature with the countless others that our customers and users request.