Support Questions

Find answers, ask questions, and share your expertise

Importing data from two tables to a single table in Hive

avatar
New Contributor

I've two external tables in Hive that have exactly the same schema and I want to import the data from these two tables into a single table. I can do a UNION ALL from two tables but it seems to be very slow. Example show below. Is there a better way to do this? Thanks.

CREATE TABLE MYTABLE
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFILE
AS
SELECT row_number() OVER(), *
FROM
(
SELECT * FROM x
UNION ALL
SELECT * FROM Y
) T2
1 ACCEPTED SOLUTION

avatar
Master Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.

View solution in original post

1 REPLY 1

avatar
Master Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.