Support Questions

Find answers, ask questions, and share your expertise

How can I control the size of the blocks that hive writes to s3?

avatar
Contributor

I'm using insert-into to write data up to S3, but it's writing very large files--0.8GB to 1.8 GB plus one of just a few K. I've tried tez.grouping.max-size and min-size, but neither seems to limit either the min or the max size of the files that are generated. I've also tried controlling the number of mappers and reducers, but to no avail.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Peter Coates - look for the parameters fs.s3a.multipart.threshold and fs.s3a.multipart.size

View solution in original post

1 REPLY 1

avatar
Expert Contributor

@Peter Coates - look for the parameters fs.s3a.multipart.threshold and fs.s3a.multipart.size