Support Questions
Find answers, ask questions, and share your expertise

Automatic reassignment of query execution when one of the Impala node fails

New Contributor

Hi,

 

I am running some tests. 

 

Submitted query to node2 in the cluster, brought down node 5, the query just hangs on node 2 for quite sometime. I understand if node2 goes down (node where the query is submitted), I need to resubmit the query. But if one of the other node goes down, do I still need to resubmit the query.

 

If so, any plans on enhancing this feature to auto detect and use another node to run that portion.

 

Thanks for your reply.

-lm

1 REPLY 1

Master Collaborator

Pasting the relevant snippets from this thread below, including my reply:

https://groups.google.com/a/cloudera.org/forum/#!topic/impala-user/ccHWO07WWEU

 

 

User:

Hi,
 

We sometimes got the error of impalad failure like this: "Couldn't open transport for xxx". And we must retry it from user-side. It wastes some computation and users don't like it.

 
I remembered Dremel supports fault tolerance and it can retry the failed sub task to another machine.
I'm not sure whether 'Fault tolerance' is in the roadmap. ( http://theory.so/impala/2013/09/29/whither-impala-fault-tolerance/ )
 
Do you have any information about this feature? And does it too hard to implement it?
 
Thanks very much.

 

 

 

Response:

Hi Zhe,

 
your suggested improvement certainly makes sense. We do plan to support fine-grained intra-query fault tolerance eventually, but we don't have any concrete release date in mind yet.
I don't expect the feature to be trivial, it's a rather fundamental change internally.
 
Please understand that we have to weigh this feature with the countless others that our customers and users request.
 
Alex