Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to create an "external" table on partitioned Parquet data?

avatar
New Contributor

Hi,

 

I'm trying to read data that's exported into Parquet partitions via an external table.

ex: a partition of the data would be in /user/root/parquet/TIMESLOT=201603162200

I can create an external table on the data IN EACH partition (using CREATE EXTERNAL TABLE xxx LIKE PARQUET syntax).

ex: CREATE EXTERNAL TABLE ingest_parquet_201505141200 LIKE PARQUET '/user/root/parquet/TIMESLOT=201505141200/part-r-00000-db2fb64b-0bb1-4fbe-a851-e71ad13f10a6.gz.parquet' LOCATION '/user/root/parquet/TIMESLOT=201505141200';

But I need to see ALL of the partitions. That means that I can't use "LIKE PARQUET" syntax anymore (because the partition key -i.e. TIMESLOT would be missing from the table in that case), and more importantly I need to change the LOCATION clause, but when I use the " location '/user/root/parquet/' " querying the external table returns no rows!

Is there a way to address the issue?

Regards,

Babak.

2 REPLIES 2

avatar
Cloudera Employee

There is a documented example here: http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_tutorial.html#tut_exter...

 

The missing steps are:

  • LOCATION should be '/user/root/parquet/' in CREATE statement
  • CREATE statement likely missing partition and file_format information (e.g. depending on TIMESLOT type):

    PARTITIONED BY (TIMESLOT BIGINT) STORED AS PARQUET

  • Add the partitions to Impala so it knows they are there.  e.g. ALTER TABLE ingest_parquet ADD PARTITION (TIMESLOT=201505141200)

Best,

 

Harrison

avatar
Explorer

What is the maximum number of partitions an external impala table can have?