- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can I control the size of the blocks that hive writes to s3?
- Labels:
-
Apache Hive
Created ‎11-02-2016 01:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using insert-into to write data up to S3, but it's writing very large files--0.8GB to 1.8 GB plus one of just a few K. I've tried tez.grouping.max-size and min-size, but neither seems to limit either the min or the max size of the files that are generated. I've also tried controlling the number of mappers and reducers, but to no avail.
Created ‎11-02-2016 02:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Peter Coates - look for the parameters fs.s3a.multipart.threshold
and fs.s3a.multipart.size
Created ‎11-02-2016 02:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Peter Coates - look for the parameters fs.s3a.multipart.threshold
and fs.s3a.multipart.size
