About epstvxj

epstvxj · ‎08-14-2016

Hi We have a single query consists of 253 plan fragments on a 43 clusters. We encountered an issue saying that "couldn't get a client for cdh-datanode-010.xxxxx.storage:22000" in the middle of the execution. I'm wondering is this because of the dedicated tcp connections required by each channel? The query consists of 212 HDFS SCAN NODE on each impalad node. Each of them is broadcasting/shuffling data to other 42 nodes, which requires, I think, 42 channels/data stream sender/scan node/per server. If each of them requires a tcp connection, then it would be 377496 connections all together, is this correct??? If this is the case, would you have any optimization suggestion to this query? We only have a partial profile for this query as it stops in the middle of execution https://dl.dropboxusercontent.com/u/13650224/impala_sql_profile_2d42c9a80da6e983_faf86d52fb685b80.sql Any comments and suggestion will be appreciated. Thanks We are using Impala 2.3

epstvxj · ‎08-04-2016

Hi For broadcast transmission, is it correct to say that, total_network_send_timer - thrift_transmit_timer is the time spend on locking and wait? The following snappit is obtaind from our Impala cluster. The data stream sender broadcasts data to 38 Impalad nodes. Since the TransmitDataRPCTime (aggregated network transmission time for all sender threads) is only 15s667ms, therefore, my understanding is that the rest of the time, i.e., 9m4s - 15s667ms is the time spent on wait/locking/updating the network send time counter. Fragment F23: Instance e040e6c84d494ed6:3255ca5dbe354ce4 (host=cdh-datanode-104.lufax.storage:22000):(Total: 9m12s, non-child: 8m52s, % non-child: 96.43%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:4/140.02 MB 4:4/340.69 MB 0:10/972.22 MB 1:7/776.15 MB 6:5/265.73 MB 9:9/964.64 MB 5:6/431.62 MB 8:3/328.13 MB 3:4/445.24 MB 2:9/742.39 MB - AverageThreadTokens: 47.69 - BloomFilterBytes: 0 - PeakMemoryUsage: 320.12 MB (335674080) - PerHostPeakMemUsage: 14.78 GB (15868058032) - PrepareTime: 176.046us - RowsProduced: 32.75M (32749654) - TotalCpuTime: 7h10m - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 9m4s - TotalStorageWaitTime: 1s531ms DataStreamSender (dst_id=55):(Total: 19s452ms, non-child: 19s452ms, % non-child: 100.00%) - BytesSent: 18.81 GB (20198277370) - NetworkThroughput(*): 1.20 GB/sec - OverallThroughput: 990.22 MB/sec - PeakMemoryUsage: 202.47 KB (207328) - RowsReturned: 32.75M (32749654) - SerializeBatchTime: 3s768ms - TransmitDataRPCTime: 15s677ms - UncompressedRowBatchSize: 32.46 GB (34850497072)

epstvxj · ‎07-26-2016

Here's the link to the profile https://dl.dropboxusercontent.com/u/13650224/profile.txt

epstvxj · ‎07-25-2016

Hi All We are tuning Impala 2.5 CDH5.71 on a large cluster. We got the following statistics that we are not able to figure out the reason why the average non-child time of ExchangeNode (id=55) is significatly different from the actual non-child time of each instance. Can anyone help to explain? Averaged Fragment F18:(Total: 9m14s, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 0, max: 0, avg: 0, stddev: 0 completion times: min:9m14s max:9m14s mean: 9m14s stddev:193.642ms execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec num instances: 38 Line 657: EXCHANGE_NODE (id=55):(Total: 9m10s, non-child: 9m10s, % non-child: 100.00%) Line 2109: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 721.501ms, % non-child: 0.13%) Line 2263: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 757.946ms, % non-child: 0.14%) Line 2430: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 735.455ms, % non-child: 0.13%) Line 2597: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 719.390ms, % non-child: 0.13%) Line 2764: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 739.788ms, % non-child: 0.13%) Line 2931: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 746.737ms, % non-child: 0.14%) Line 3098: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 785.411ms, % non-child: 0.14%) Line 3265: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 729.887ms, % non-child: 0.13%) Line 3432: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 760.815ms, % non-child: 0.14%) Line 3599: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 706.177ms, % non-child: 0.13%) Line 3766: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 761.745ms, % non-child: 0.14%) Line 3933: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 712.175ms, % non-child: 0.13%) Line 4100: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 708.088ms, % non-child: 0.13%) Line 4267: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 756.727ms, % non-child: 0.14%) Line 4434: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 705.065ms, % non-child: 0.13%) Line 4601: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 763.063ms, % non-child: 0.14%) Line 4768: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 728.033ms, % non-child: 0.13%) Line 4935: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 717.250ms, % non-child: 0.13%) Line 5102: EXCHANGE_NODE (id=55):(Total: 9m10s, non-child: 735.945ms, % non-child: 0.13%) Line 5269: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 770.983ms, % non-child: 0.14%) Line 5436: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 725.207ms, % non-child: 0.13%) Line 5603: EXCHANGE_NODE (id=55):(Total: 9m10s, non-child: 767.848ms, % non-child: 0.14%) Line 5770: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 741.356ms, % non-child: 0.13%) Line 5937: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 766.130ms, % non-child: 0.14%) Line 6104: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 680.675ms, % non-child: 0.12%) Line 6271: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 772.464ms, % non-child: 0.14%) Line 6438: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 776.331ms, % non-child: 0.14%) Line 6605: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 689.149ms, % non-child: 0.13%) Line 6772: EXCHANGE_NODE (id=55):(Total: 8m59s, non-child: 714.814ms, % non-child: 0.13%) Line 6939: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 733.797ms, % non-child: 0.13%) Line 7106: EXCHANGE_NODE (id=55):(Total: 9m10s, non-child: 732.212ms, % non-child: 0.13%) Line 7273: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 723.093ms, % non-child: 0.13%) Line 7440: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 780.243ms, % non-child: 0.14%) Line 7607: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 721.486ms, % non-child: 0.13%) Line 7774: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 752.089ms, % non-child: 0.14%) Line 7941: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 713.284ms, % non-child: 0.13%) Line 8108: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 727.432ms, % non-child: 0.13%) Line 8275: EXCHANGE_NODE (id=55):(Total: 9m11s, non-child: 772.459ms, % non-child: 0.14%)

epstvxj · ‎02-04-2016

Thanks. Does this affect the final execution time listed in the Timeline part of the profile, do I have to substract the additional time recored in the hash join part to get the correct query execution time?

epstvxj · ‎02-03-2016

Hi Can any one help to explain what's going on in this hash join node. The non-child time is 5s326ms. However, if I sum up all the breakdown costs, i.e., build time, probe time, build partition time and etc, it is not equal to the execution time of this node, i.e., non-child time. So what part of the execution time is missing from this profile? The full profile is attached in the link below. We are using CDH4.5.8 Impala 2.2. HASH_JOIN_NODE (id=2):(Total: 8s009ms, non-child: 5s326ms, % non-child: 66.51%) ExecOption: Build Side Codegen Enabled, Probe Side Codegen Enabled, Join Build-Side Prepared Asynchronously - BuildPartitionTime: 246.892ms - BuildRows: 1.50M (1504350) - BuildRowsPartitioned: 1.50M (1504350) - BuildTime: 205.22ms - GetNewBlockTime: 2.107ms - HashBuckets: 4.19M (4194304) - LargestPartitionPercent: 6 (6) - MaxPartitionLevel: 0 (0) - NumRepartitions: 0 (0) - PartitionsCreated: 16 (16) - PeakMemoryUsage: 451.02 MB (472932352) - PinTime: 0ns - ProbeRows: 2.50M (2502844) - ProbeRowsPartitioned: 0 (0) - ProbeTime: 860.379ms - RowsReturned: 417.52K (417520) - RowsReturnedRate: 52.13 K/sec - SpilledPartitions: 0 (0) - UnpinTime: 997ns https://my.syncplicity.com/share/knuknsvjyz1kzyu/profile password: 123456

epstvxj · ‎01-26-2016

Hi We are using CDH4.5.8 Impala 2.2. We are confused by some spill to disk activities and would like to understand why this is happening. Here's a snippet of the profile, we had 128 GB memory available for Impala (YARN and Admission Controller are disabled) and stats are available for all tables. Scratch directories are set to one per disk (23 HDD in total). No mem_limit is set with query option either. The 4-node cluster has 3 Impalad running on 3 of the node and data nodes are co-located with the impalad. BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 279 (279) - BlocksRecycled: 113 (113) - BufferedPins: 2 (2) - BytesWritten: 140.00 MB (146798191) - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 128.93 GB (138436362240) - PeakMemoryUsage: 654.59 MB (686391904) - TotalBufferWaitTime: 0ns - TotalEncryptionTime: 0ns - TotalIntegrityCheckTime: 0ns - TotalReadBlockTime: 122.868ms We are just wondering why Impala spill data to disk when PeakMemoryUsage is only 654.59??? We are looking for any potential explanations for this scenario. Thanks link to profile: https://my.syncplicity.com/share/bbkeaucyp3dnpsq/C2300_M1600_TPCDS_WFI_q98 passwod: 123456

Online	Offline
Last Visited	‎08-19-2016 09:52 PM

Member Since	‎01-26-2016 07:44 PM
Last Visited	‎08-19-2016 09:52 PM
Posts	11

Cloudera Community

Single Query (with 253 Plan Fragments) Causes TPC ...

total_network_send_timer and thrift_transmit_timer

Re: Invalid Exchange Node Non-Child Time

Invalid Exchange Node Non-Child Time

Re: Hash join node no-child time is much bigger th...

Hash join node no-child time is much bigger than t...

Unexpected Spill to Disk Activity