Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can I control the size of the blocks that hive writes to s3?

Solved Go to solution

How can I control the size of the blocks that hive writes to s3?

New Contributor

I'm using insert-into to write data up to S3, but it's writing very large files--0.8GB to 1.8 GB plus one of just a few K. I've tried tez.grouping.max-size and min-size, but neither seems to limit either the min or the max size of the files that are generated. I've also tried controlling the number of mappers and reducers, but to no avail.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How can I control the size of the blocks that hive writes to s3?

Contributor

@Peter Coates - look for the parameters fs.s3a.multipart.threshold and fs.s3a.multipart.size

1 REPLY 1
Highlighted

Re: How can I control the size of the blocks that hive writes to s3?

Contributor

@Peter Coates - look for the parameters fs.s3a.multipart.threshold and fs.s3a.multipart.size

Don't have an account?
Coming from Hortonworks? Activate your account here