Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to insert individual rows into hive based on subqueries

Solved Go to solution
Highlighted

How to insert individual rows into hive based on subqueries

Super Collaborator

I am trying to accomplish something like this:

insert into table daily_counts

values

(

select count(*) from table_a,

select count(*) from table_b

)

...etc.

I know the syntax works for known values. How can I enter the result of a sub-query?

I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement.

But it MUST be possible with HiveQL, right?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to insert individual rows into hive based on subqueries

Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.

insert into table daily_counts
select count(*), 'table_a' from table_a
   UNION 
select count(*), 'table_b' from table_b
   UNION
select count(*), 'table_c' from table_c
...
;

View solution in original post

4 REPLIES 4

Re: How to insert individual rows into hive based on subqueries

Hi @Zack Riesland. Here's how I normally do this. It's not specifically a subquery but accomplishes what you're looking for.

insert into table daily_counts
select count(*), 'table_a' from table_a
   UNION 
select count(*), 'table_b' from table_b
   UNION
select count(*), 'table_c' from table_c
...
;

View solution in original post

Highlighted

Re: How to insert individual rows into hive based on subqueries

Edited the post to fix syntax error. Now it runs just fine. ;-)

Highlighted

Re: How to insert individual rows into hive based on subqueries

Super Collaborator

Thanks, @bpreachuk

Interestingly, I think we found a major Hive bug.

When I run a query like the one above, it appears to enter an infinite loop.

I even pared my data set down to 3 columns - counting 3 tables.

And it appears to spawn MR jobs indefinitely.

I worked around it. But someone should look at the UNION operation.

Happy Friday!

Highlighted

Re: How to insert individual rows into hive based on subqueries

Yikes! Have never seen that happen before, but I certainly have no reason to doubt you. Does it happen with hive.execution.engine=tez? If you could grab & sanitize your query/config details & post that as a Hive bug in Jira it would be greatly appreciated... we don't want that problem to bite anyone else.

Don't have an account?
Coming from Hortonworks? Activate your account here