Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Importing data from two tables to a single table in Hive

Solved Go to solution

Importing data from two tables to a single table in Hive

New Contributor

I've two external tables in Hive that have exactly the same schema and I want to import the data from these two tables into a single table. I can do a UNION ALL from two tables but it seems to be very slow. Example show below. Is there a better way to do this? Thanks.

CREATE TABLE MYTABLE
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFILE
AS
SELECT row_number() OVER(), *
FROM
(
SELECT * FROM x
UNION ALL
SELECT * FROM Y
) T2
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Importing data from two tables to a single table in Hive

Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.

View solution in original post

1 REPLY 1
Highlighted

Re: Importing data from two tables to a single table in Hive

Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here