Created on 05-23-2016 12:48 PM - edited 09-16-2022 03:21 AM
Let's suppose we have a table:
Owner | Pets
------------------------------
Jack | "dog, cat, crocodile"
Mary | "bear, pig"
I want to get as a result:
Owner | Pets
------------------------------
Jack | "dog"
Jack | "cat"
Jack | "crocodile"
Mary | "bear"
Mary | "pig"
I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.
Any help would be greatly appreciated!
Created 05-23-2016 11:00 PM
What you are looking for are table functions (or user-defined table functions) like split(). Unfortunately, Impala does not yet support table functions. You should consider using Hive which has the split() function to transform the data into a more read-friendly format. I understand that some data is produced in such a nested-delimited format, but you depending on your use case it could be wasteful to split and unnest the string data for every read query.
Created 05-27-2016 12:19 AM
Thank you very much for your help and your prompt reply.
So perhaps creating a UDF that returns an array and then split the array can be a workaround.
According to the documentation a UDF can't return a complex type but is this still true for CDH 5.5+ ? Because according to this complex types are support from cloudera now.
So to summarize my question, is it possible to write a UDF that takes the comma delimited string and returns an array of strings using Impala UDFs?
Created 05-27-2016 12:32 AM
It is not possible to do that today, because Impala UDFs can only produce scalar values (and can only take scalar arguments). We do intend to extend the UDF support to allow complex argument and return types, but there is no concrete timeline yet.