Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

tools to generate hive test data

can anyone suggest any tool to generate test data of large sizes (atleast 1gb) for hive? Looking for simpler tools unlike testbench.

2 REPLIES 2

Expert Contributor

@Mrinmoy Choudhury Tpch are relatively simpler and you can use https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build and set up the data.

Also https://catalog.data.gov/dataset/college-scorecard has good amount of data which you can use to populate hive tables.

Hi @Mrinmoy Choudhury,

You can use the the scripts here to generate the data (https://github.com/cartershanklin/sandbox-datagen)

You can extract the tar file and run datagen.sh with different scale of data.

Thanks,

Aditya

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.