- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive Explode with array_index with no Cartesian product
- Labels:
-
Apache Hive
Created ‎05-24-2016 12:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi i have a Hive table
Hi i have a Hive table select a,b,c,d from riskfactor_table In the above table B, C and D columns are array columns. Below is my Hive DDL Create external table riskfactor_table (a string, b array<string>, c array<double>, d array<double> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' stored as textfile location 'user/riskfactor/data'; Here is my table data: ID400S,["jms","jndi","jaxb","jaxn"],[100,200,300,400],[1,2,3,4] ID200N,["one","two","three"],[212,352,418],[6,10,8] If i want to split array columns how can i split? If i use explode function i can split array values for only one column select explode(b) as b from riskfactor_table; Output: jms jndi jaxb jxn one two three But i want all the columns to be populated using one select statement below- Query - select a,b,c,d from risk_factor; Output: row1- ID400S jms 100 1 row2- ID400S jndi 200 2 row3- ID400S jaxb 300 3 row4- ID400S jaxn 400 4 How can i populate all the data?
Created ‎05-24-2016 05:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sai Satish you can achieve it in this way
create table aTable(a int,b array<String>);insert into table aTable select 1,array('a','b') from dummyTable;hive> select * from aTable; OK 1["a","b"] 1["a","b"] select a,expl_tbl from aTable LATERAL VIEW explode(b) exploded_table as expl_tbl; Query ID = hive_20160524053317_27cb538e-43e1-436f-8744-ec53a0c9d3b2 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1463989024283_0004) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 3.25 s -------------------------------------------------------------------------------- OK 1a 1b 1a 1b Time taken: 4.765 seconds, Fetched: 4 row(s)
Created ‎05-24-2016 05:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sai Satish you can achieve it in this way
create table aTable(a int,b array<String>);insert into table aTable select 1,array('a','b') from dummyTable;hive> select * from aTable; OK 1["a","b"] 1["a","b"] select a,expl_tbl from aTable LATERAL VIEW explode(b) exploded_table as expl_tbl; Query ID = hive_20160524053317_27cb538e-43e1-436f-8744-ec53a0c9d3b2 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1463989024283_0004) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 3.25 s -------------------------------------------------------------------------------- OK 1a 1b 1a 1b Time taken: 4.765 seconds, Fetched: 4 row(s)
Created ‎05-24-2016 03:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for your reply.
I am looking for multiple array columns solution.What if my table contains more than one array column if i use Lateral view explode in my Hive query it results Cartesian product.
If i have
10 array values in my Column1,
10 array values in Column2,
10 array values in Column3
then the result will be 1000 rows
which is wrong.I want the
result output also 10 rows.
I was checking in stack overflow (http://stackoverflow.com/questions/20667473/hive-explode-lateral-view-multiple-arrays)
i got some info using array_index and brickhouse jar we can do it but when i run the similar query in my Hive its not working. I am using Hive version is - 1.2.1
Any help ?
Created ‎12-09-2016 11:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for reply
Created ‎07-05-2018 12:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@akshay singh @Sai Satish I am also facing same issue. Were you able to solve this?
