Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to create table and parse structured textfile in hive that have extra column

How to create table and parse structured textfile in hive that have extra column

New Contributor

Need help to find to create table in HIVE

Below sample txt file. the n_count field will identify how many occurrence of count_value - the count value can be range from [1 to 1000]

Sample textfile Delimited by ~

h1~n_count~count_value ----- Header

a~3~1~3~3

b~2~1~2

c~4~a~b~c~d

d~3-x~x~x

My goal is to get this result so that i can use the lateral view overide

a,3,{1,3,3}

b,2,{1,2}

c,4,{a,b,c,d}

d,3,{x,x,x}

Final Result.

h1,count,each_count_value

a,3,1

a,3,3

a,3,3

b,2,1

b,2,2

c,4,a

c,4,b

c,4,c

c,4,d

...

...

Do i have to use HIVE RegexSerDe? or any best solution

Thanks You

Os

2 REPLIES 2

Re: How to create table and parse structured textfile in hive that have extra column

Expert Contributor

@Osmaro Gariando

For the above usecase, you can try to use array datatype.

eg., create table test1(h1 string, n_count int, count_value array<int>) row format delimited fields terminated by '~' collection items terminated by '#' stored as textfile;

Content in the text file should be delimited as follows

a~3~1#3#3

b~2~2#5

...

h1(string)~n_count(int)~count_value(array<int> delimieted with #)

0: jdbc:hive2://abcd:10000> select * from test1;

+-----------+----------------+--------------------+--+

| test1.h1 | test1.n_count | test1.count_value |

+-----------+----------------+--------------------+--+

| a | 3 | [1,3,3] |

| b | 2 | [2,5] |

+-----------+----------------+--------------------+--+

0: jdbc:hive2://abcd:10000> select h1, n_count, exploded_count from test1 LATERAL VIEW explode(count_value) t1 as exploded_count;

+-----+----------+-----------------+--+

| h1 | n_count | exploded_count |

+-----+----------+-----------------+--+

| a | 3 | 1 |

| a | 3 | 3 |

| a | 3 | 3 |

| b | 2 | 2 |

| b | 2 | 5 |

+-----+----------+-----------------+--+

Highlighted

Re: How to create table and parse structured textfile in hive that have extra column

New Contributor

Thank you .I have tried this and it work. The only issue is we dont have a way to change the delimiter for n_count_value.

I've created a POC for this requirement by creating a script that will dynamically drop/create hive table by specifying the number of n count as parameter.

example

lvalue = 'c1,c2,c3,c4,....' <- this are the dynamic column

I also use split(CONCAT_WS(',' ,${lvalue}),',') as arr_value to get the count_value in array and then i used the LATERAL VIEW explode

Thanks

Os