Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to create table and parse structured textfile in hive that have extra column

Explorer

Need help to find to create table in HIVE

Below sample txt file. the n_count field will identify how many occurrence of count_value - the count value can be range from [1 to 1000]

Sample textfile Delimited by ~

h1~n_count~count_value ----- Header

a~3~1~3~3

b~2~1~2

c~4~a~b~c~d

d~3-x~x~x

My goal is to get this result so that i can use the lateral view overide

a,3,{1,3,3}

b,2,{1,2}

c,4,{a,b,c,d}

d,3,{x,x,x}

Final Result.

h1,count,each_count_value

a,3,1

a,3,3

a,3,3

b,2,1

b,2,2

c,4,a

c,4,b

c,4,c

c,4,d

...

...

Do i have to use HIVE RegexSerDe? or any best solution

Thanks You

Os

2 REPLIES 2

Expert Contributor

@Osmaro Gariando

For the above usecase, you can try to use array datatype.

eg., create table test1(h1 string, n_count int, count_value array<int>) row format delimited fields terminated by '~' collection items terminated by '#' stored as textfile;

Content in the text file should be delimited as follows

a~3~1#3#3

b~2~2#5

...

h1(string)~n_count(int)~count_value(array<int> delimieted with #)

0: jdbc:hive2://abcd:10000> select * from test1;

+-----------+----------------+--------------------+--+

| test1.h1 | test1.n_count | test1.count_value |

+-----------+----------------+--------------------+--+

| a | 3 | [1,3,3] |

| b | 2 | [2,5] |

+-----------+----------------+--------------------+--+

0: jdbc:hive2://abcd:10000> select h1, n_count, exploded_count from test1 LATERAL VIEW explode(count_value) t1 as exploded_count;

+-----+----------+-----------------+--+

| h1 | n_count | exploded_count |

+-----+----------+-----------------+--+

| a | 3 | 1 |

| a | 3 | 3 |

| a | 3 | 3 |

| b | 2 | 2 |

| b | 2 | 5 |

+-----+----------+-----------------+--+

Explorer

Thank you .I have tried this and it work. The only issue is we dont have a way to change the delimiter for n_count_value.

I've created a POC for this requirement by creating a script that will dynamically drop/create hive table by specifying the number of n count as parameter.

example

lvalue = 'c1,c2,c3,c4,....' <- this are the dynamic column

I also use split(CONCAT_WS(',' ,${lvalue}),',') as arr_value to get the count_value in array and then i used the LATERAL VIEW explode

Thanks

Os

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.