Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Inconsistent TPCH Data Generators

Inconsistent TPCH Data Generators

Explorer

I was using Hive-testbench (at https://github.com/hortonworks/hive-testbench) to generate tpch data sets, i started to generate a dataset of 10 gb to hive (./tpch-build.sh 10). Making a select count(*) on the generated hive table "part" it gives a total of 2000000 rows. But meanwhile i decide to download the official tpch_tool 2.17 and generate the 10 gb .tbl files and then build a hive database. For the same data size 10 gb using the newly generated table with the .tbl files the same count query gives a total of 86586082.

How is this possible, the number of rows show be the same. Can anyone give an idea of whats going on?

Thanks

1 REPLY 1

Re: Inconsistent TPCH Data Generators

@mÁRIO Rodrigues

Are you running tpch for same Hive version? Share the output select count(*) from both the tables.

Don't have an account?
Coming from Hortonworks? Activate your account here