- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Comma delimited string to individual rows
- Labels:
-
Apache Impala
-
Apache Pig
Created on 05-23-2016 12:48 PM - edited 09-16-2022 03:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's suppose we have a table:
Owner | Pets
------------------------------
Jack | "dog, cat, crocodile"
Mary | "bear, pig"
I want to get as a result:
Owner | Pets
------------------------------
Jack | "dog"
Jack | "cat"
Jack | "crocodile"
Mary | "bear"
Mary | "pig"
I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.
Any help would be greatly appreciated!
Created 05-23-2016 11:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you are looking for are table functions (or user-defined table functions) like split(). Unfortunately, Impala does not yet support table functions. You should consider using Hive which has the split() function to transform the data into a more read-friendly format. I understand that some data is produced in such a nested-delimited format, but you depending on your use case it could be wasteful to split and unnest the string data for every read query.
Created 05-27-2016 12:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your help and your prompt reply.
So perhaps creating a UDF that returns an array and then split the array can be a workaround.
According to the documentation a UDF can't return a complex type but is this still true for CDH 5.5+ ? Because according to this complex types are support from cloudera now.
So to summarize my question, is it possible to write a UDF that takes the comma delimited string and returns an array of strings using Impala UDFs?
Created 05-27-2016 12:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not possible to do that today, because Impala UDFs can only produce scalar values (and can only take scalar arguments). We do intend to extend the UDF support to allow complex argument and return types, but there is no concrete timeline yet.
