Support Questions
Find answers, ask questions, and share your expertise

External Parquet table


External Parquet table


If I have a process writing Parquet files to a location in HDFS, how can I create an external Impala table that uses these files?  How do I refrence the schema that is contained within these files?


For example, if my parquet file contains 'State' and 'Population', would I need to create columns in Impala called 'State' and 'Population' or could I create and column name and the data is just used in the same order?  Ex: 

create external table parquet_table_name (x STRING, y INT) LOCATION '/user/testuser/data';

Re: External Parquet table


Yes, you will need to create the table with the identical schema as stored in the parquet files.


In the upcoming release, we augmented the create table stmt to populate the schema from an

existing parquet file.