- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
tools to generate hive test data
- Labels:
-
Apache Hive
Created ‎09-19-2017 08:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can anyone suggest any tool to generate test data of large sizes (atleast 1gb) for hive? Looking for simpler tools unlike testbench.
Created ‎09-19-2017 09:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mrinmoy Choudhury Tpch are relatively simpler and you can use https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build and set up the data.
Also https://catalog.data.gov/dataset/college-scorecard has good amount of data which you can use to populate hive tables.
Created ‎09-19-2017 09:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use the the scripts here to generate the data (https://github.com/cartershanklin/sandbox-datagen)
You can extract the tar file and run datagen.sh with different scale of data.
Thanks,
Aditya
