Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala refresh took over 36385 seconds to complete - too slow

Impala refresh took over 36385 seconds to complete - too slow

Contributor

 

We are on Impala Shell v2.8.0-cdh5.11.1 community edition

SSL is enabled but no sentry

Executed a command refresh schema.table 

 

17:15:14    Query: refresh hbasestage.raw_transactions
17:15:14    Query submitted at: 2018-02-06 01:15:14 (Coordinator: http://hadoop4-private.wdc01.infra.ripple.com:25000)
03:21:40    Query progress can be monitored at: http://hadoop4-private.wdc01.infra.ripple.com:25000/query_plan?query_id=984250bf5d880d33:fddc42d100000000
03:21:40    
03:21:40    Fetched 0 row(s) in 36385.13s

There has to be a better way

 

Table is normally populated by Hive, so refresh is required for impala to recognize new partitions

Table has 1861 partitions - total 1.28 TB of data, each partition is no bigger than 3GB (partition is by date)

Files are avro but that shouldn't impact it (should it?)

Yarn does NOT manage resources

1 REPLY 1

Re: Impala refresh took over 36385 seconds to complete - too slow

Explorer

10 hours is a long time for a refresh. Since this is a partitoned table, if you know the partitions being added then you could use the new "refresh table ... partition ..." syntax to only look at those partitons.

 

Alternatively, recover partitions is faster than refresh for partitioned tables if you're only adding new partitions (vs. updating existing ones):

https://www.cloudera.com/documentation/enterprise/5-11-x/topics/impala_refresh.html

 

Regards,

Mark

Don't have an account?
Coming from Hortonworks? Activate your account here