We are trying to build an external table by joining two other tables A and B.
Table A has around 432,65807 number of rows and having around 45 columns.
Table B has around 379405 number of rows and having around 17 columns.
Both tables are being LEFT JOIN based on 3 columns and producing around 65618892 rows count on hive query editor. that is perfectly fine.
But when we insert the result set directly to an External Table/ Avro Table / Parquet Table then in each case it wrote ONLY 11520062 number of rows. and missing around 54098830 number of rows.
Note : we are using single node Hortonworks HDP virtual machine having 12GB's of RAM.
Queries we were trying are
1) DROP TABLE IF EXISTS my_avro_tbl;
CREATE TABLE my_avro_tbl(colA string,
STORED AS AVRO;
INSERT OVERWRITE TABLE my_avro_tbl SELECT colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ, colK, colL
FROM Table_A LEFT OUTER JOIN Table_B
ON X = Y;