Posts: 11
Registered: ‎05-01-2014

External Parquet table

If I have a process writing Parquet files to a location in HDFS, how can I create an external Impala table that uses these files?  How do I refrence the schema that is contained within these files?


For example, if my parquet file contains 'State' and 'Population', would I need to create columns in Impala called 'State' and 'Population' or could I create and column name and the data is just used in the same order?  Ex: 

create external table parquet_table_name (x STRING, y INT) LOCATION '/user/testuser/data';
Cloudera Employee
Posts: 27
Registered: ‎09-27-2013

Re: External Parquet table

Yes, you will need to create the table with the identical schema as stored in the parquet files.


In the upcoming release, we augmented the create table stmt to populate the schema from an

existing parquet file.