Reply
Explorer
Posts: 20
Registered: ‎03-31-2017
Accepted Solution

Impala Failing to Recognize Partitioning

Hi,

 

I currently have data sitting in an HDFS location at, say, /location. The data is paritioned by YEAR/MONTH/DAY, and the subfolder structure looks like YEAR=2017/MONTH=8/DAY=2. I am attempting to create an external table on this data, but upon doing so the partitioning is not being recognized. The two commands I've tried are:

drop table if exists db.table;
create external table db.table like parquet '/location/file.parquet' partitioned by (YEAR int, MONTH int, DAY int) stored as parquet location '/location';
alter table db.table recover partitions;
compute incremental stats db.table;

 

And...

 

drop table if exists db.table

create external table db.table(

field1 string,

field2 string,

...

) partitioned by (YEAR int, MONTH int, DAY int) stored as parquet location '/location/';

alter table db.table recover partitions;

compute incremental stats db.table;

 

In both cases, I end up with an empty table that is correctly partitioned. Calling invalidate metadata; after the fact did not resolve the issue. I've verifified that the impala user is on the facl lists for these areas. Does anyone know why it would not be finding the data?

 

I should point out that if I ignore partitioning and instead just try and build a table on top of data from one day (IE. YEAR=2017/MONTH=8/DAY=2), the data shows.

 

Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Impala Failing to Recognize Partitioning

Not sure if this is the problem, but you might try using lower case names in the HDFS path, i.e.:

 

year=2017/month=8/day=2

instead of

YEAR=2017/MONTH=8/DAY=2

Explorer
Posts: 20
Registered: ‎03-31-2017

Re: Impala Failing to Recognize Partitioning

Setting them to lower case didn't work immediately - what did work was going back and setting each hdfs file name to lowercase and refreshing the partitioning.

 

Lesson learned, always set column partitioning names to lowercase when you need to build an external table on them.

Highlighted
Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: Impala Failing to Recognize Partitioning

Thanks for following up with the solution.

 

Sorry for the pain, I understand it's somewhat user unfriendly. The explanation for the current behavior goes like this:

 

Column names are generally case insensitive from the Impala SQL perspective, but HDFS file paths are case sensitive. So it could cause confusion if you had paths like this in HDFS:

 

YEAR=2000/MONTH=1

year=2000/month=1

Year=2000/Month=1

 

Are they different partitions? All the same partition? Can one partition point to multiple directories... You see where I am going :). It's just easier to accept one canonical casing.

 

Announcements