Support Questions
Find answers, ask questions, and share your expertise

[Please Help] Count query in Hive returning wrong results.

[Please Help] Count query in Hive returning wrong results.

New Contributor

Hi Team,

 

I am facing the following issue on a hive (Multi-Node cluster) when executing the following query:

select count(1) from dbname.personal_data_rc tablesample(1000 rows);

Environment:

Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive

 

When I started investigating this issue I observed that the count result is being affected by the following property:

hive.tez.input.format

 

Following are the cases:

 

Case 1.1:

 

Total rows: 60,000

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

Queryselect count(1) from dbname.personal_data_rc tablesample(1000 rows);

Output - 3000

 

Case 1.2: 

Total rows: 60,000

hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

Queryselect count(1) from dbname.personal_data_rc tablesample(1000 rows);

Output - 1000

 

Case 2.1:

 

Total rows: 1 million

hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

Queryselect count(1) from dbname.personal_data_rc tablesample(1000 rows);

Output - 3000

 

Case 2.2: 

Total rows: 1 Million

hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

Queryselect count(1) from dbname.personal_data_rc tablesample(1000 rows);

Output - 7000

 

Can you help me understand why if I have TABLESAMPLE set to 1000 rows, it is giving me more rows? Is there any other property that must be used with CombineHiveInputFormat/HiveInputFormat or is it an issue with CombineHiveInputFormat/HiveInputFormat only? 

If it's not the above case, then what could be the reason for this behaviour? 

Attaching image for reference

Screenshot from 2021-03-08 19-20-14.png

 

Any help will be much appreciated. 

 

Thanks & Regards,

Jitender