Partitioned hive tables use a directory hierarchy of table-name/partitionKey1=value1/partitionKey2=value2
The following article recommends ensuring that files have more variance early in their name. Can anyone comment on a best practice for partitioned hive tables in s3.
Or is this not an issue?
This should not be a problem for Impala because Impala caches metadata and the objects being queried are typically large.