- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to insert individual rows into hive based on subqueries
- Labels:
-
Apache Hive
Created ‎11-16-2016 06:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to accomplish something like this:
insert into table daily_counts
values
(
select count(*) from table_a,
select count(*) from table_b
)
...etc.
I know the syntax works for known values. How can I enter the result of a sub-query?
I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement.
But it MUST be possible with HiveQL, right?
Created ‎11-16-2016 07:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.
insert into table daily_counts select count(*), 'table_a' from table_a UNION select count(*), 'table_b' from table_b UNION select count(*), 'table_c' from table_c ... ;
Created ‎11-16-2016 07:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.
insert into table daily_counts select count(*), 'table_a' from table_a UNION select count(*), 'table_b' from table_b UNION select count(*), 'table_c' from table_c ... ;
Created ‎11-16-2016 10:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Edited the post to fix syntax error. Now it runs just fine. 😉
Created ‎11-18-2016 12:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @bpreachuk
Interestingly, I think we found a major Hive bug.
When I run a query like the one above, it appears to enter an infinite loop.
I even pared my data set down to 3 columns - counting 3 tables.
And it appears to spawn MR jobs indefinitely.
I worked around it. But someone should look at the UNION operation.
Happy Friday!
Created ‎11-18-2016 12:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.
