Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Partitioned table in s3, best path bucket/path strategy for hive/impala

Highlighted

Partitioned table in s3, best path bucket/path strategy for hive/impala

Explorer

Partitioned hive tables use a directory hierarchy of table-name/partitionKey1=value1/partitionKey2=value2

 

The following article recommends ensuring that files have more variance early in their name.  Can anyone comment on a best practice for partitioned hive tables in s3.

 

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

 

Or is this not an issue?

1 REPLY 1

Re: Partitioned table in s3, best path bucket/path strategy for hive/impala

Cloudera Employee

This should not be a problem for Impala because Impala caches metadata and the objects being queried are typically large.

Don't have an account?
Coming from Hortonworks? Activate your account here