Support Questions
Find answers, ask questions, and share your expertise

Impala Performance Issue Diagnosis Help

New Contributor

We are running into an issue where we have a bunch of Impala ETL processes executing insert overwrite statements in parallel into a set of partitioned tables. At the same time we have Impala querying another set of tables. The query performance of the tables not being written to degrades substantially when these other tables loads are in process. However, there is no apparent maxing out of any server resources as far as we can tell. The only other thing worth noting is that the Hive Metastore CPU utilization does appear to be spiking around the same time but well within the available resources.
 
I pasted the impala profile below of a simple select * from table_name limit 1 to illustrate the issue. Any help diagnosing this issue would be much appreciated.

 
Query (id=741e57f6de03b7f:de2f010d8cccd0a4)
Summary
Session ID: 16410073743b952f:6d1959a3798bf2b8
Session Type: BEESWAX
Start Time: 2015-06-16 01:51:44.165482000
End Time: 2015-06-16 01:53:14.792052000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.1.4-cdh5 RELEASE (build c3368fed88531330e44169e0c62e2c98d7f4215d)
User: ubuntu
Connected User: ubuntu
Delegated User:
Network Address: ::ffff:
Default Db: default
Sql Statement: select * from table_name limit 1
Coordinator: worker-host:22000
Plan:
----------------
Estimated Per-Host Requirements: Memory=0B VCores=0
F00:PLAN FRAGMENT [UNPARTITIONED]
00:SCAN HDFS [detail.table_name]
partitions=1260/1260 files=4846 size=1001.18GB
table stats: 14552131210 rows total
column stats: all
limit: 1
hosts=14 per-host-mem=unavailable
tuple-ids=0 row-size=485B cardinality=1
----------------
Estimated Per-Host Mem: 0
Estimated Per-Host VCores: 0
Request Pool: root.ubuntu
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------
00:SCAN HDFS 1 346.160ms 346.160ms 1 1 115.82 MB -1.00 B table_name 
Query Timeline
Start execution: 36252
Planning finished: 90143020524
Ready to start remote fragments: 90184945881
Remote fragments started: 90184947570
Rows available: 90187890093
First row fetched: 90289660820
Unregister query: 90626569890
ImpalaServer
- AsyncTotalTime: 0
- ClientFetchWaitTimer: 104547181
- InactiveTotalTime: 0
- RowMaterializationTimer: 34804
- TotalTime: 0
Execution Profile 741e57f6de03b7f:de2f010d8cccd0a4
Fragment start latencies: count: 0
- AsyncTotalTime: 0
- FinalizationTimer: 0
- InactiveTotalTime: 0
- TotalTime: 353937602
Coordinator Fragment F00
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 4:805/167.02 GB 1:823/168.21 GB 3:781/160.48 GB 0:849/176.82 GB 5:799/161.88 GB 2:789/166.76 GB
- AsyncTotalTime: 0
- AverageThreadTokens: 1.0
- InactiveTotalTime: 0
- PeakMemoryUsage: 121728848
- PerHostPeakMemUsage: 0
- PrepareTime: 12131698
- RowsProduced: 1
- TotalCpuTime: 149434187
- TotalNetworkReceiveTime: 0
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 305588082
- TotalTime: 348533108
BlockMgr
- AsyncTotalTime: 0
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 7378697739434983424
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
HDFS_SCAN_NODE (id=0)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 4:805/167.02 GB 1:823/168.21 GB 3:781/160.48 GB 0:849/176.82 GB 5:799/161.88 GB 2:789/166.76 GB
Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0%
ExecOption: Codegen enabled: 0 out of 1
- AsyncTotalTime: 0
- AverageHdfsReadThreadConcurrency: 0.0
- AverageScannerThreadConcurrency: 0.0
- BytesRead: 74399201
- BytesReadDataNodeCache: 0
- BytesReadLocal: 0
- BytesReadRemoteUnexpected: 57621985
- BytesReadShortCircuit: 0
- DecompressionTime: 562934
- InactiveTotalTime: 0
- MaxCompressedTextFileLength: 0
- NumColumns: 0
- NumDisksAccessed: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 121450320
- PerReadThreadRawHdfsThroughput: 57675228
- RemoteScanRanges: 18
- RowsRead: 2048
- RowsReturned: 1
- RowsReturnedRate: 2
- ScanRangesComplete: 0
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 0
- MaterializeTupleTime(*): 0
- ScannerThreadsSysTime: 0
- ScannerThreadsUserTime: 0
- ScannerThreadsVoluntaryContextSwitches: 0
- TotalRawHdfsReadTime(*): 1289968036
- TotalReadThroughput: 0
- TotalTime: 346160201

 

 

Looking at the profile, there is a big lag between the start execution and the planning finished. How do we know what is causing this lag?

 

Query Timeline
Start execution: 36252
Planning finished: 90143020524

2 REPLIES 2

Master Collaborator

Hi!

 

a very long "planning time" often indicates that the query is bottlenecked on loading/refreshing the table metadata.

 

In Impala, every impalad has a local cache of metadata. A query accessing a table with stale/missing metadata will trigger a metadata load in the catalogd.

The query will wait until the metadata is loaded and has been returned to that impalad.

 

There are more complicated variations of the issue above due to the metadata also being disseminated to all impalads via the statestore, but I'm hoping that hint can help you dig into the issue further.

 

Cheers,

 

Alex

 

New Contributor

Yep it was exactly this. We had a bunch of impala-shell commands with the -r argument, thus we were invalidating metadata on many parallel processes. We've removed invalidate metadata and refresh statements in a lot of places based on the fact that it's not needed for much of our Impala ETL processes. Has any thought been put into somehow registering these metadata refreshes in the statestore so that if similar requests are running they don't overwhelm the metastore? Either that or post a warning when there are too many metastore refreshes running at the same time? We spent a lot of time digging in on this so anything to help others who encounter similar issues would probably be a good thing.

 

Thanks a lot for your reply, Alex.