Posts: 120
Registered: ‎10-15-2014

Impala refresh took over 36385 seconds to complete - too slow


We are on Impala Shell v2.8.0-cdh5.11.1 community edition

SSL is enabled but no sentry

Executed a command refresh schema.table 


17:15:14    Query: refresh hbasestage.raw_transactions
17:15:14    Query submitted at: 2018-02-06 01:15:14 (Coordinator:
03:21:40    Query progress can be monitored at:
03:21:40    Fetched 0 row(s) in 36385.13s

There has to be a better way


Table is normally populated by Hive, so refresh is required for impala to recognize new partitions

Table has 1861 partitions - total 1.28 TB of data, each partition is no bigger than 3GB (partition is by date)

Files are avro but that shouldn't impact it (should it?)

Yarn does NOT manage resources

Posts: 11
Registered: ‎07-10-2014

Re: Impala refresh took over 36385 seconds to complete - too slow

10 hours is a long time for a refresh. Since this is a partitoned table, if you know the partitions being added then you could use the new "refresh table ... partition ..." syntax to only look at those partitons.


Alternatively, recover partitions is faster than refresh for partitioned tables if you're only adding new partitions (vs. updating existing ones):