- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive Split for columns
- Labels:
-
Apache Hive
Created 09-25-2018 05:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i have a column with following data
Store | Items |
22 | 1001 abc, 1002 pqr, 1003 tuv |
33 | 1004 def, 1005 xyz |
And i want to split the column and have the data as follows.
Store | Item_Id | Item_name |
22 | 1001 | abc |
22 | 1002 | pqr |
22 | 1003 | tuv |
33 | 1004 , | def |
33 | 1005 | xyz |
Created 09-26-2018 07:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Srikanth
You can try using the below approach :
create table test1(store id, items STRING); insert into table test1 values(22, '1001 abc, 1002 pqr, 1003 tuv'); insert into table test1 values(33, '1004 def, 1005 xyz');
I have created a sample table in Hive and executed below query to get the expected result.
select store, split(item,' ')[0] as item_id,split(item,' ')[1] as item_name from test1 lateral view explode(split(items,', ')) vExplodeTbl as item;
Created 09-26-2018 08:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The easiest approach is to use lateral views.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
It allows you to split an array into multiple line.
1. Let's create an array from the items in your column "items"
select key, split(items, ',') as valArray from test result +------+---------------------------------------+--+ | key | _c1 | +------+---------------------------------------+--+ | 22 | ["1001 abc"," 1002 pqr"," 1003 tuv"] | | 33 | ["1004 def"," 1005 xyz"] | +------+---------------------------------------+--+
2. Now let's use lateral view to split these items into lines (using "trim" to clean up the space)
select key, trim(uniqueVal) from( select key, split(items, ',') as valArray from test ) a lateral view explode(a.valArray) exploded as uniqueVal ; +------+-----------+--+ | key | _c1 | +------+-----------+--+ | 22 | 1001 abc | | 22 | 1002 pqr | | 22 | 1003 tuv | | 33 | 1004 def | | 33 | 1005 xyz | +------+-----------+--+
3. Finally let's use split again to get separate values.
select key, split(trim(uniqueVal), ' ')[0], split(trim(uniqueVal), ' ')[1] from( select key, split(items, ',') as valArray from test ) a lateral view explode(a.valArray) exploded as uniqueVal ; +------+-------+------+--+ | key | _c1 | _c2 | +------+-------+------+--+ | 22 | 1001 | abc | | 22 | 1002 | pqr | | 22 | 1003 | tuv | | 33 | 1004 | def | | 33 | 1005 | xyz | +------+-------+------+--+
Note :
I used the following to create the table
create table test ( key string, value string ) STORED AS ORC ; INSERT INTO test (key, value ) VALUES (22, '1001 abc, 1002 pqr, 1003 tuv'), (33, '1004 def, 1005 xyz');
