Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

tools to generate hive test data

avatar
Contributor

can anyone suggest any tool to generate test data of large sizes (atleast 1gb) for hive? Looking for simpler tools unlike testbench.

2 REPLIES 2

avatar
Super Collaborator

@Mrinmoy Choudhury Tpch are relatively simpler and you can use https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build and set up the data.

Also https://catalog.data.gov/dataset/college-scorecard has good amount of data which you can use to populate hive tables.

avatar
Super Guru

Hi @Mrinmoy Choudhury,

You can use the the scripts here to generate the data (https://github.com/cartershanklin/sandbox-datagen)

You can extract the tar file and run datagen.sh with different scale of data.

Thanks,

Aditya