Support Questions
Find answers, ask questions, and share your expertise

Inconsistent TPCH Data Generators


I was using Hive-testbench (at to generate tpch data sets, i started to generate a dataset of 10 gb to hive (./ 10). Making a select count(*) on the generated hive table "part" it gives a total of 2000000 rows. But meanwhile i decide to download the official tpch_tool 2.17 and generate the 10 gb .tbl files and then build a hive database. For the same data size 10 gb using the newly generated table with the .tbl files the same count query gives a total of 86586082.

How is this possible, the number of rows show be the same. Can anyone give an idea of whats going on?



@mÁRIO Rodrigues

Are you running tpch for same Hive version? Share the output select count(*) from both the tables.

; ;