Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

impala split udf

avatar
Contributor

I am on CDH 5.2 using Impala 2.0

Does anybody know when the UDTF split function be available?  

 

I have an array that is delimited by Pipe that has a max of 6 entries; min of 4

split[0], split[3] would have worked perfectly

 

Ideas are welcome on how to work around my challenge 

example 

aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff

 

a is variable length but always there

b may not be present

c may not be present

d, e and f is always there fixed length

 

I am think a really convoluted usage of substr

 

 

1 ACCEPTED SOLUTION

avatar

 

> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1)
aaaaaaaa
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3)
ccc
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6)
ffffff
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1)
aaaaaaaa
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3)
eeeeeeee
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6)
""

 

View solution in original post

2 REPLIES 2

avatar

 

> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1)
aaaaaaaa
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3)
ccc
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6);
Query: select regexp_extract('aaaaaaaa|bbb|ccc|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6)
ffffff
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 1)
aaaaaaaa
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 3)
eeeeeeee
Fetched 1 row(s) in 0.01s
 
> select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6);
Query: select regexp_extract('aaaaaaaa|ddddddddddd|eeeeeeee|ffffff', '([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|([a-z]+)\\|?([a-z]*)\\|?([a-z]*)', 6)
""

 

avatar
Contributor

somewhat better than me leveraging nested substr and reverses

 

but still wouldn't mind seeing the split function be brought in