Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Importing data from two tables to a single table in Hive

avatar
Visitor

I've two external tables in Hive that have exactly the same schema and I want to import the data from these two tables into a single table. I can do a UNION ALL from two tables but it seems to be very slow. Example show below. Is there a better way to do this? Thanks.

CREATE TABLE MYTABLE
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFILE
AS
SELECT row_number() OVER(), *
FROM
(
SELECT * FROM x
UNION ALL
SELECT * FROM Y
) T2
1 ACCEPTED SOLUTION

avatar
Master Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.

View solution in original post

1 REPLY 1

avatar
Master Mentor

@john doe use STORED AS ORC instead of RCFILE

also, use CTAS to create a table from one table first, then do on the second table

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement;

see if that improves performance, I know it's 2 steps but interesting use case.