If I have a process writing Parquet files to a location in HDFS, how can I create an external Impala table that uses these files? How do I refrence the schema that is contained within these files?
For example, if my parquet file contains 'State' and 'Population', would I need to create columns in Impala called 'State' and 'Population' or could I create and column name and the data is just used in the same order? Ex:
create external table parquet_table_name (x STRING, y INT) LOCATION '/user/testuser/data';
Yes, you will need to create the table with the identical schema as stored in the parquet files.
In the upcoming release, we augmented the create table stmt to populate the schema from an
existing parquet file.