Support Questions

Find answers, ask questions, and share your expertise

Impala refresh took over 36385 seconds to complete - too slow

avatar
Contributor

 

We are on Impala Shell v2.8.0-cdh5.11.1 community edition

SSL is enabled but no sentry

Executed a command refresh schema.table 

 

17:15:14    Query: refresh hbasestage.raw_transactions
17:15:14    Query submitted at: 2018-02-06 01:15:14 (Coordinator: http://hadoop4-private.wdc01.infra.ripple.com:25000)
03:21:40    Query progress can be monitored at: http://hadoop4-private.wdc01.infra.ripple.com:25000/query_plan?query_id=984250bf5d880d33:fddc42d100000000
03:21:40    
03:21:40    Fetched 0 row(s) in 36385.13s

There has to be a better way

 

Table is normally populated by Hive, so refresh is required for impala to recognize new partitions

Table has 1861 partitions - total 1.28 TB of data, each partition is no bigger than 3GB (partition is by date)

Files are avro but that shouldn't impact it (should it?)

Yarn does NOT manage resources

1 REPLY 1

avatar
Contributor

10 hours is a long time for a refresh. Since this is a partitoned table, if you know the partitions being added then you could use the new "refresh table ... partition ..." syntax to only look at those partitons.

 

Alternatively, recover partitions is faster than refresh for partitioned tables if you're only adding new partitions (vs. updating existing ones):

https://www.cloudera.com/documentation/enterprise/5-11-x/topics/impala_refresh.html

 

Regards,

Mark