Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Split for columns

Highlighted

Hive Split for columns

New Contributor

Hi,

i have a column with following data

Store Items
22 1001 abc, 1002 pqr, 1003 tuv
33 1004 def, 1005 xyz

And i want to split the column and have the data as follows.

Store Item_Id Item_name
22 1001 abc
22 1002 pqr
22 1003 tuv
33 1004 , def
33 1005 xyz
2 REPLIES 2
Highlighted

Re: Hive Split for columns

Hi @Srikanth

You can try using the below approach :

create table test1(store id, items STRING); insert into table test1 values(22, '1001 abc, 1002 pqr, 1003 tuv'); insert into table test1 values(33, '1004 def, 1005 xyz');

I have created a sample table in Hive and executed below query to get the expected result.

select store, split(item,' ')[0] as item_id,split(item,' ')[1] as item_name from test1 lateral view explode(split(items,', ')) vExplodeTbl as item;

Highlighted

Re: Hive Split for columns

Cloudera Employee

@Srikanth t

The easiest approach is to use lateral views.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

It allows you to split an array into multiple line.

1. Let's create an array from the items in your column "items"

select key, split(items, ',') as valArray
	from test

result 

+------+---------------------------------------+--+
| key  |                  _c1                  |
+------+---------------------------------------+--+
| 22   | ["1001 abc"," 1002 pqr"," 1003 tuv"]  |
| 33   | ["1004 def"," 1005 xyz"]              |
+------+---------------------------------------+--+

2. Now let's use lateral view to split these items into lines (using "trim" to clean up the space)

select key, trim(uniqueVal)
from(
 select key, split(items, ',') as valArray
 from test ) a lateral view explode(a.valArray) exploded as uniqueVal ;

+------+-----------+--+
| key  |    _c1    |
+------+-----------+--+
| 22   | 1001 abc  |
| 22   | 1002 pqr  |
| 22   | 1003 tuv  |
| 33   | 1004 def  |
| 33   | 1005 xyz  |
+------+-----------+--+

3. Finally let's use split again to get separate values.

select key, split(trim(uniqueVal), ' ')[0], split(trim(uniqueVal), ' ')[1]
from(
	select key, split(items, ',') as valArray
	from test 
	) a lateral view explode(a.valArray) exploded as uniqueVal ;

+------+-------+------+--+
| key  |  _c1  | _c2  |
+------+-------+------+--+
| 22   | 1001  | abc  |
| 22   | 1002  | pqr  |
| 22   | 1003  | tuv  |
| 33   | 1004  | def  |
| 33   | 1005  | xyz  |
+------+-------+------+--+ 

Note :

I used the following to create the table

create table test (
key string, 
value string )
STORED AS ORC ;

INSERT INTO test (key, value )
VALUES (22, '1001 abc, 1002 pqr, 1003 tuv'),
(33, '1004 def, 1005 xyz');
Don't have an account?
Coming from Hortonworks? Activate your account here