Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala query parquet files from S3

Impala query parquet files from S3

New Contributor



I am curious, for using Impala to query parquet files from S3, does it seek only download the needed columns, or it download the whole file first? I remember S3 files being an object so that it doesnt allow to seek specific bytes which is needed to efficiently use parquet files.





Re: Impala query parquet files from S3

Cloudera Employee

Impala uses a range get via the S3A connector to download only the column chunks needed.