Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Comma delimited string to individual rows

avatar
New Contributor

Let's suppose we have a table:

Owner | Pets
------------------------------
Jack  | "dog, cat, crocodile"
Mary | "bear, pig"


I want to get as a result:

Owner | Pets
------------------------------
Jack | "dog"
Jack | "cat"
Jack | "crocodile"
Mary | "bear"
Mary | "pig"


I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.

Any help would be greatly appreciated!

3 REPLIES 3

avatar

What you are looking for are table functions (or user-defined table functions) like split(). Unfortunately, Impala does not yet support table functions. You should consider using Hive which has the split() function to transform the data into a more read-friendly format. I understand that some data is produced in such a nested-delimited format, but you depending on your use case it could be wasteful to split and unnest the string data for every read query.

avatar
New Contributor

Thank you very much for your help and your prompt reply.

 

So perhaps creating a UDF that returns an array and then split the array can be a workaround.

According to the documentation a UDF can't return a complex type but is this still true for CDH 5.5+ ? Because according to this complex types are support from cloudera now.

 

So to summarize my question, is it possible to write a UDF that takes the comma delimited string and returns an array of strings using Impala UDFs?

avatar

It is not possible to do that today, because Impala UDFs can only produce scalar values (and can only take scalar arguments). We do intend to extend the UDF support to allow complex argument and return types, but there is no concrete timeline yet.