Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Table loads less rows than the actual rows in the query

avatar
New Contributor

I am using hortonworks vm for hive computations.I am joining two hive tables and want to store the result in another hive table. When I run my query it returns me 65618892 rows but when i save the same query in the hive external table it only stores 11520062 rows and ends the query successfully without any error. I need help to store all of my data set in the hive external table.

2 REPLIES 2

avatar

@Umair Majeed

Could you explain in brief how did you load the external table?

Also did you took count(*) to check the record count or it by through the stats available in the table properties?

Whats is the delimited , rows seperator used in the external table?

There are chances that the rows from the source query may be splitted into multiple records in the external tables. Check for the column delimiter and rows seperator. It might solve your issue.

avatar
New Contributor

@Bala Vignesh N V

I loaded the external table using the following query

_____________________________________________________________________________________

drop table if exists table1;

CREATE EXTERNAL TABLE table1 (col1 String,col2 String,col3 String,col4 String,col5 String,col6 String,col7 String)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines terminated by '\n' STORED AS TEXTFILE

LOCATION '/user/maria_dev/';

select * from table1; The count of the rows for this table is 4,32,65,807.

_____________________________________________________________________________

Then I loaded another table with the same query as above. This table has 3,79,405 rows.

Now , i joined the two queries using the following query

SELECT distinct tb1.col1,tb1.col2,tb1.col3,tb1.col4,tb1.col5,tb1.col6, tb2.col1,tb2.col2,tb2.col3,tb2.col4,tb2.col5,tb2.col6

FROM table1 tb1 LEFT OUTER JOIN table2 tb2 ON (tb1.col1 = tb2.col1);

The count for this comes out to be 65618892.

But when i save this query in a external table as follows the count of rows come out to be 11520062.

____________________________________________________________________________________

CREATE EXTERNAL TABLE resultant (col1 STRING,col2 STRING,col3 STRING,col4 STRING,col5 STRING, col6 STRING,col7 STRING,col8 STRING,col9 STRING,col10 STRING,col11 STRING, col12 STRING);

INSERT OVERWRITE TABLE resultant

SELECT distinct tb1.col1,tb1.col2,tb1.col3,tb1.col4,tb1.col5,tb1.col6, tb2.col1,tb2.col2,tb2.col3,tb2.col4,tb2.col5,tb2.col6

FROM table1 tb1 LEFT OUTER JOIN table2 tb2 ON (tb1.col1 = tb2.col1);

_____________________________________________________________________________________

This is how i loaded the tables. And i used count(*) and show tblproperties yourTableName to calculate the rows.