Created 09-19-2017 08:30 AM
can anyone suggest any tool to generate test data of large sizes (atleast 1gb) for hive? Looking for simpler tools unlike testbench.
Created 09-19-2017 09:40 AM
@Mrinmoy Choudhury Tpch are relatively simpler and you can use https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build and set up the data.
Also https://catalog.data.gov/dataset/college-scorecard has good amount of data which you can use to populate hive tables.
Created 09-19-2017 09:53 AM
Hi @Mrinmoy Choudhury,
You can use the the scripts here to generate the data (https://github.com/cartershanklin/sandbox-datagen)
You can extract the tar file and run datagen.sh with different scale of data.
Thanks,
Aditya