Created 11-21-2017 10:15 PM
Need help to find to create table in HIVE
Below sample txt file. the n_count field will identify how many occurrence of count_value - the count value can be range from [1 to 1000]
Sample textfile Delimited by ~
h1~n_count~count_value ----- Header
a~3~1~3~3
b~2~1~2
c~4~a~b~c~d
d~3-x~x~x
My goal is to get this result so that i can use the lateral view overide
a,3,{1,3,3}
b,2,{1,2}
c,4,{a,b,c,d}
d,3,{x,x,x}
Final Result.
h1,count,each_count_value
a,3,1
a,3,3
a,3,3
b,2,1
b,2,2
c,4,a
c,4,b
c,4,c
c,4,d
...
...
Do i have to use HIVE RegexSerDe? or any best solution
Thanks You
Os
Created 11-23-2017 05:48 AM
For the above usecase, you can try to use array datatype.
eg., create table test1(h1 string, n_count int, count_value array<int>) row format delimited fields terminated by '~' collection items terminated by '#' stored as textfile;
Content in the text file should be delimited as follows
a~3~1#3#3
b~2~2#5
...
h1(string)~n_count(int)~count_value(array<int> delimieted with #)
0: jdbc:hive2://abcd:10000> select * from test1;
+-----------+----------------+--------------------+--+
| test1.h1 | test1.n_count | test1.count_value |
+-----------+----------------+--------------------+--+
| a | 3 | [1,3,3] |
| b | 2 | [2,5] |
+-----------+----------------+--------------------+--+
0: jdbc:hive2://abcd:10000> select h1, n_count, exploded_count from test1 LATERAL VIEW explode(count_value) t1 as exploded_count;
+-----+----------+-----------------+--+
| h1 | n_count | exploded_count |
+-----+----------+-----------------+--+
| a | 3 | 1 |
| a | 3 | 3 |
| a | 3 | 3 |
| b | 2 | 2 |
| b | 2 | 5 |
+-----+----------+-----------------+--+
Created 11-28-2017 03:17 PM
Thank you .I have tried this and it work. The only issue is we dont have a way to change the delimiter for n_count_value.
I've created a POC for this requirement by creating a script that will dynamically drop/create hive table by specifying the number of n count as parameter.
example
lvalue = 'c1,c2,c3,c4,....' <- this are the dynamic column
I also use split(CONCAT_WS(',' ,${lvalue}),',') as arr_value to get the count_value in array and then i used the LATERAL VIEW explode
Thanks
Os