<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Table loads less rows than the actual rows in the query in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-Table-loads-less-rows-than-the-actual-rows-in-the-query/m-p/202352#M164358</link>
    <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/116755/hive-table-loads-less-rows-than-the-actual-rows-in.html#"&gt;@Bala Vignesh N V&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;I loaded the external table using the following query &lt;/P&gt;&lt;P&gt;_____________________________________________________________________________________&lt;/P&gt;&lt;P&gt;drop table if exists table1; &lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE table1 (col1 String,col2 String,col3 String,col4 String,col5 String,col6 String,col7 String) &lt;/P&gt;&lt;P&gt;ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines terminated by '\n' STORED AS TEXTFILE &lt;/P&gt;&lt;P&gt;LOCATION '/user/maria_dev/'; &lt;/P&gt;&lt;P&gt;select * from table1; The count of the rows for this table is 4,32,65,807.&lt;/P&gt;&lt;P&gt;_____________________________________________________________________________&lt;/P&gt;&lt;P&gt;Then I loaded another table with the same query as above. This table has 3,79,405 rows.&lt;/P&gt;&lt;P&gt;Now , i joined the two queries using the following query &lt;/P&gt;&lt;P&gt;SELECT distinct tb1.col1,tb1.col2,tb1.col3,tb1.col4,tb1.col5,tb1.col6,
tb2.col1,tb2.col2,tb2.col3,tb2.col4,tb2.col5,tb2.col6 &lt;/P&gt;&lt;P&gt;FROM table1 tb1 LEFT OUTER JOIN table2 tb2
ON (tb1.col1 = tb2.col1);&lt;/P&gt;&lt;P&gt;The count for this comes out to be 65618892.&lt;/P&gt;&lt;P&gt;But when i save this query in a external table as follows the count of rows come out to be  11520062.&lt;/P&gt;&lt;P&gt;____________________________________________________________________________________&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE resultant (col1 STRING,col2 STRING,col3 STRING,col4 STRING,col5 STRING, col6 STRING,col7 STRING,col8 STRING,col9 STRING,col10 STRING,col11 STRING, col12 STRING); &lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE resultant&lt;/P&gt;&lt;P&gt;SELECT distinct tb1.col1,tb1.col2,tb1.col3,tb1.col4,tb1.col5,tb1.col6, tb2.col1,tb2.col2,tb2.col3,tb2.col4,tb2.col5,tb2.col6 &lt;/P&gt;&lt;P&gt;FROM table1 tb1 LEFT OUTER JOIN table2 tb2 ON (tb1.col1 = tb2.col1);&lt;/P&gt;&lt;P&gt;_____________________________________________________________________________________&lt;/P&gt;&lt;P&gt;This is how i loaded the tables. And i used count(*) and show tblproperties yourTableName to calculate the rows.&lt;/P&gt;</description>
    <pubDate>Mon, 31 Jul 2017 21:31:43 GMT</pubDate>
    <dc:creator>umairmajeed61</dc:creator>
    <dc:date>2017-07-31T21:31:43Z</dc:date>
  </channel>
</rss>

