Created 11-16-2016 06:56 PM
I am trying to accomplish something like this:
insert into table daily_counts
values
(
select count(*) from table_a,
select count(*) from table_b
)
...etc.
I know the syntax works for known values. How can I enter the result of a sub-query?
I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement.
But it MUST be possible with HiveQL, right?
Created 11-16-2016 07:03 PM
Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.
insert into table daily_counts select count(*), 'table_a' from table_a UNION select count(*), 'table_b' from table_b UNION select count(*), 'table_c' from table_c ... ;
Created 11-16-2016 07:03 PM
Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.
insert into table daily_counts select count(*), 'table_a' from table_a UNION select count(*), 'table_b' from table_b UNION select count(*), 'table_c' from table_c ... ;
Created 11-16-2016 10:42 PM
Edited the post to fix syntax error. Now it runs just fine. 😉
Created 11-18-2016 12:04 PM
Thanks, @bpreachuk
Interestingly, I think we found a major Hive bug.
When I run a query like the one above, it appears to enter an infinite loop.
I even pared my data set down to 3 columns - counting 3 tables.
And it appears to spawn MR jobs indefinitely.
I worked around it. But someone should look at the UNION operation.
Happy Friday!
Created 11-18-2016 12:11 PM
Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.