Support Questions

Find answers, ask questions, and share your expertise

How to insert individual rows into hive based on subqueries

avatar
Super Collaborator

I am trying to accomplish something like this:

insert into table daily_counts

values

(

select count(*) from table_a,

select count(*) from table_b

)

...etc.

I know the syntax works for known values. How can I enter the result of a sub-query?

I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement.

But it MUST be possible with HiveQL, right?

1 ACCEPTED SOLUTION

avatar

Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.

insert into table daily_counts
select count(*), 'table_a' from table_a
   UNION 
select count(*), 'table_b' from table_b
   UNION
select count(*), 'table_c' from table_c
...
;

View solution in original post

4 REPLIES 4

avatar

Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.

insert into table daily_counts
select count(*), 'table_a' from table_a
   UNION 
select count(*), 'table_b' from table_b
   UNION
select count(*), 'table_c' from table_c
...
;

avatar

Edited the post to fix syntax error. Now it runs just fine. 😉

avatar
Super Collaborator

Thanks, @bpreachuk

Interestingly, I think we found a major Hive bug.

When I run a query like the one above, it appears to enter an infinite loop.

I even pared my data set down to 3 columns - counting 3 tables.

And it appears to spawn MR jobs indefinitely.

I worked around it. But someone should look at the UNION operation.

Happy Friday!

avatar

Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.