Created on 02-26-2018 07:35 AM - edited 09-16-2022 05:54 AM
I am using Impala+kudu table, I don´t know if Impala is just a interface to see that tables in hue/shell or it works as proces engine when I launch a query (select,update). ¿Is it using Impala when I launch a query? Impala+kudu allows UDF or it just works for Impala (without kudu storage) ?
thanks in advance
Created 02-27-2018 08:06 AM
Kudu has the capability to evaluate simple filters natively, e.g. using the primary index of a table, so Impala will push such filters directly to Kudu.
More complex filters (e.g. those involving UDFs) are evaluated by Impala after receiving rows from Kudu.
Impala clearly distinguishes the filters evaluated by Kudu and those by Impala in the explain plan.
Created on 02-27-2018 04:26 AM - edited 02-27-2018 04:29 AM
Hi @PedroGaVal
In effect, Impala is a query engine, that you can pass the queries through it to interogate the data stored in HDFS or KUDU files.
And when you use KUDU you don't need a UDFs! because the Impala/KUDU support the UPDATE/DELETE statements.
Created 02-27-2018 05:32 AM
OK, thanks @AcharkiMed , I understand that Impala not only show kudu (as external table) but also process the data. If you create a UDF ('validateCard' as Impala function) I guess you can use it, so kudu is just a storage and it does not process nothing. Then if some data is storaged in kudu format it does not use hdfs. I'm right?
Created 02-27-2018 07:53 AM
You are welcome @PedroGaVal
Yes you are absolutely right man.
Created 02-27-2018 08:06 AM
Kudu has the capability to evaluate simple filters natively, e.g. using the primary index of a table, so Impala will push such filters directly to Kudu.
More complex filters (e.g. those involving UDFs) are evaluated by Impala after receiving rows from Kudu.
Impala clearly distinguishes the filters evaluated by Kudu and those by Impala in the explain plan.