New Contributor
Posts: 2
Registered: ‎11-23-2015

Impala query parquet files from S3



I am curious, for using Impala to query parquet files from S3, does it seek only download the needed columns, or it download the whole file first? I remember S3 files being an object so that it doesnt allow to seek specific bytes which is needed to efficiently use parquet files.




Cloudera Employee
Posts: 7
Registered: ‎08-26-2014

Re: Impala query parquet files from S3

Impala uses a range get via the S3A connector to download only the column chunks needed.