Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Inconsistent TPCH Data Generators

Highlighted

Inconsistent TPCH Data Generators

Explorer

I was using Hive-testbench (at https://github.com/hortonworks/hive-testbench) to generate tpch data sets, i started to generate a dataset of 10 gb to hive (./tpch-build.sh 10). Making a select count(*) on the generated hive table "part" it gives a total of 2000000 rows. But meanwhile i decide to download the official tpch_tool 2.17 and generate the 10 gb .tbl files and then build a hive database. For the same data size 10 gb using the newly generated table with the .tbl files the same count query gives a total of 86586082.

How is this possible, the number of rows show be the same. Can anyone give an idea of whats going on?

Thanks

1 REPLY 1
Highlighted

Re: Inconsistent TPCH Data Generators

@mÁRIO Rodrigues

Are you running tpch for same Hive version? Share the output select count(*) from both the tables.

Don't have an account?
Coming from Hortonworks? Activate your account here