Support Questions

Find answers, ask questions, and share your expertise

Partition, file size and number of files question

Contributor

HI,

 

I have a big table, partitioned by year/month,total size 500+G, 19 partitions so far, each partion right now about 7-8 files each about 3-4G (total 138 files) i have five nodes running impala, it is very slow for a simple select, should i reduce the number of files in each partition and increase the size of each file?

 

Thanks

Shannon

14 REPLIES 14

Contributor

Also block size is 512M

Contributor

When i run a simple select with where id ='xxx', i see there are 1111 jobs/tasks(?), it is bcoz my block size is 512 and total is 500G, so 1000+ blocks to scan? If so i should increase my block size?

Champion

Somtimes I prefer bucketing over Partition due to large number of files getting created . 

Also in bucketing actually you have the control over the number of buckets.

I would suggest you test the bucketing over partition in your test env 

also it is a good practice to collect statistics for the table it will help in the performance side . 

Contributor

Thanks, i will take a look at bucketing.

 

Do you mean run compute stats, yes i did that.

Contributor

Took a quick look at bucketing, i dont have one column  that is used the most, and this table will join other tables later.

 

As far as controlling the number of files, since i have an idea of the data size, i can controll the file size when doing insert -select i can set hive.merge.size.per.task and hive.merge.smallfiles.avgsize.

 

My questions,

1, in this case, how big can i set the block size, 1G, 2G, will it hurt if set to big?

2, how big should i use for the file, right now i set to 4G, should i increase?

3, i have 5 nodes, with that for each partition it is better to have at least 5 files?

 

Thanks

Shannon

Expert Contributor

Hi Shannon,

 

Impala does not split up Parquet files over several readers when reading them. Instead, only one daemon will be assigned for each file and will read the whole file. Therefore it is recommended to have only one block per file. Otherwise some of the blocks can be on remote nodes and remote reads will slow down your queries. See this page for more information: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_perf_cookbook.html

 

Cheers, Lars

Contributor

Thanks Lars, trying to understand how it appies here, should i try to increate the blozks size to 4G, and keep each file under 4G (3-4G)?

 

Shannon

Expert Contributor

I'd try to reduce the file size to 256MB and make sure that the block size is at least that large, too. That way you should end up with 32GB * 4 = 128 files per partition. That should allow you to exploit parallelism across all your nodes. You can also try 512MB per file and see if that improves things, but I suspect it won't.

 

Btw, we're currently working on improving the ETL performance. You may want to look at the "SORT BY" clause that is included in Impala 2.9 and how it allows you to write data in a way that allows Impala to skip row groups much more effectively. You can find more information in the umbrella JIRA: https://issues.apache.org/jira/browse/IMPALA-2522

Contributor

But will the total files too many? i had smaller sized files before with 512M, was slow as well. we only have 5 nodes now so too many files will not help?

 

Thanks

Shannon

Expert Contributor

The files shouldn't be too many. Impala processes files in parallel locally, too, so you should see a higher utilization on each node. Can you post a profile of one of the slow queries?

Contributor

Posted here, thank you for helping out.

 

-----------------------------------------------------------

 

Query (id=d9451b2a703b10ea:7b8ec0b200000000):
Summary:
Session ID: 3c45d712aca91b57:cdd15e04d5ca5997
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-08-11 13:03:23.398808000
End Time: 2017-08-11 13:05:37.772410000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.8.0-cdh5.11.1 RELEASE (build 3382c1c488dff12d5ca8d049d2b59babee605b4e)
User: root
Connected User: root
Delegated User:
Network Address: 10.10.16.7:47208
Default Db: default
Sql Statement: select id from image_review_image_data_v3 v
where transactionid ='304641412'
Coordinator: host.local:22000
Query Options (non default): QUERY_TIMEOUT_S=600
Plan:
----------------
Estimated Per-Host Requirements: Memory=176.00MB VCores=1

PLAN-ROOT SINK
|
01:EXCHANGE [UNPARTITIONED]
| hosts=5 per-host-mem=unavailable
| tuple-ids=0 row-size=50B cardinality=28
|
00:SCAN HDFS [default.image_review_image_data_v3 v, RANDOM]
partitions=18/18 files=138 size=520.24GB
predicates: transactionid = '304641412'
table stats: 2961307241 rows total
column stats: all
hosts=5 per-host-mem=176.00MB
tuple-ids=0 row-size=50B cardinality=28
----------------
Estimated Per-Host Mem: 184549376
Estimated Per-Host VCores: 1
Request Pool: root.root
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE 1 0.000ns 0.000ns 24 28 0 -1.00 B UNPARTITIONED
00:SCAN HDFS 5 1m56s 2m4s 24 28 61.42 MB 176.00 MB default.image_review_image_...
Errors:
Planner Timeline: 3.336ms
- Analysis finished: 655.820us (655.820us)
- Equivalence classes computed: 735.594us (79.774us)
- Single node plan created: 1.969ms (1.234ms)
- Runtime filters computed: 2.068ms (98.686us)
- Distributed plan created: 2.155ms (87.078us)
- Lineage info computed: 2.201ms (46.370us)
- Planning finished: 3.336ms (1.134ms)
Query Timeline: 2m14s
- Query submitted: 0.000ns (0.000ns)
- Planning finished: 5s431ms (5s431ms)
- Submit for admission: 5s435ms (4.000ms)
- Completed admission: 5s435ms (0.000ns)
- Ready to start 6 fragment instances: 5s437ms (2.000ms)
- All 6 fragment instances started: 5s447ms (10.000ms)
- Rows available: 11s084ms (5s637ms)
- First row fetched: 12s195ms (1s111ms)
- Unregister query: 2m14s (2m2s)
- ComputeScanRangeAssignmentTimer: 2.000ms
ImpalaServer:
- ClientFetchWaitTimer: 1s658ms
- RowMaterializationTimer: 2m1s
Execution Profile d9451b2a703b10ea:7b8ec0b200000000:(Total: 2m7s, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------

Fragment instance start latencies: Count: 6, 25th %-ile: 2ms, 50th %-ile: 2ms, 75th %-ile: 4ms, 90th %-ile: 4ms, 95th %-ile: 8ms, 99.9th %-ile: 8ms
Per Node Peak Memory Usage: host2.local:22000(61.43 MB) host5.local.local:22000(62.37 MB) host4.local.local:22000(56.81 MB) host3.local:22000(60.35 MB) host.local:22000(64.13 MB)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:
Instance d9451b2a703b10ea:7b8ec0b200000000 (host=host.local:22000):(Total: 2m8s, non-child: 1.000ms, % non-child: 0.00%)
MemoryUsage(4s000ms): 8.00 KB, 10.58 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 13.60 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 16.19 KB (16576)
- PerHostPeakMemUsage: 64.13 MB (67250313)
- RowsProduced: 24 (24)
- TotalNetworkReceiveTime: 2m8s
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
- TotalThreadsInvoluntaryContextSwitches: 0 (0)
- TotalThreadsTotalWallClockTime: 2m8s
- TotalThreadsSysTime: 0.000ns
- TotalThreadsUserTime: 1.091ms
- TotalThreadsVoluntaryContextSwitches: 6 (6)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m2s
- ExecTreeExecTime: 2m2s
- OpenTime: 5s626ms
- ExecTreeOpenTime: 5s625ms
- PrepareTime: 19.000ms
- ExecTreePrepareTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
PLAN_ROOT_SINK:
- PeakMemoryUsage: 0
CodeGen:(Total: 19.000ms, non-child: 19.000ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 0.000ns
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 0 (0)
- NumInstructions: 0 (0)
- OptimizationTime: 0.000ns
- PeakMemoryUsage: 0
- PrepareTime: 19.000ms
EXCHANGE_NODE (id=1):(Total: 2m8s, non-child: 0.000ns, % non-child: 0.00%)
BytesReceived(4s000ms): 0, 190.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 366.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B
- BytesReceived: 470.00 B (470)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 5s625ms
- PeakMemoryUsage: 0
- RowsReturned: 24 (24)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:(Total: 2m, non-child: 3s765ms, % non-child: 3.12%)
split sizes: min: 85.20 GB, max: 109.11 GB, avg: 104.05 GB, stddev: 9.43 GB
completion times: min:1m49s max:2m8s mean: 2m stddev:6s502ms
execution rates: min:798.37 MB/sec max:940.70 MB/sec mean:881.14 MB/sec stddev:48.68 MB/sec
num instances: 5
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 61.02 MB (63980069)
- PerHostPeakMemUsage: 61.02 MB (63982552)
- RowsProduced: 4 (4)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m3s
- TotalThreadsInvoluntaryContextSwitches: 68 (68)
- TotalThreadsTotalWallClockTime: 18m
- TotalThreadsSysTime: 14s479ms
- TotalThreadsUserTime: 22s701ms
- TotalThreadsVoluntaryContextSwitches: 10.65K (10647)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m
- ExecTreeExecTime: 1m56s
- OpenTime: 28.399ms
- ExecTreeOpenTime: 200.000us
- PrepareTime: 20.199ms
- ExecTreePrepareTime: 999.999us
DataStreamSender (dst_id=1):
- BytesSent: 94.00 B (94)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 4 (4)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 265.00 B (265)
CodeGen:(Total: 46.399ms, non-child: 46.399ms, % non-child: 100.00%)
- CodegenTime: 800.000us
- CompileTime: 6.599ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 20.399ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 18.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m56s, non-child: 1m56s, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.81 GB (1947540891)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.80 GB (1935027176)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.80 GB (1935027176)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.16K (1159)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.13 MB (60950173)
- PerReadThreadRawHdfsThroughput: 15.82 MB/sec
- RemoteScanRanges: 119 (119)
- RowBatchQueueGetWaitTime: 1m56s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 592.26M (592261448)
- RowsReturned: 4 (4)
- RowsReturnedRate: 0
- ScanRangesComplete: 222 (222)
- ScannerThreadsInvoluntaryContextSwitches: 62 (62)
- ScannerThreadsTotalWallClockTime: 15m59s
- MaterializeTupleTime(*): 29s997ms
- ScannerThreadsSysTime: 11s399ms
- ScannerThreadsUserTime: 21s866ms
- ScannerThreadsVoluntaryContextSwitches: 6.90K (6904)
- TotalRawHdfsReadTime(*): 1m57s
- TotalReadThroughput: 15.34 MB/sec
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
Fragment F00:
Instance d9451b2a703b10ea:2 (host=host4.local.local:22000):(Total: 2m8s, non-child: 3s584ms, % non-child: 2.79%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/107.89 GB
MemoryUsage(4s000ms): 28.26 MB, 25.90 MB, 24.97 MB, 30.79 MB, 26.85 MB, 21.63 MB, 28.63 MB, 29.20 MB, 27.14 MB, 23.07 MB, 28.53 MB, 22.50 MB, 27.40 MB, 26.63 MB, 30.06 MB, 30.50 MB, 29.07 MB, 27.06 MB, 28.04 MB, 34.46 MB, 26.72 MB, 33.66 MB, 28.47 MB, 28.68 MB, 31.87 MB, 27.85 MB, 38.58 MB, 33.31 MB, 29.26 MB, 30.06 MB, 31.92 MB, 29.39 MB
ThreadUsage(4s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.96
- BloomFilterBytes: 0
- PeakMemoryUsage: 56.81 MB (59570649)
- PerHostPeakMemUsage: 56.81 MB (59570649)
- RowsProduced: 8 (8)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 16m5s
- TotalThreadsInvoluntaryContextSwitches: 55 (55)
- TotalThreadsTotalWallClockTime: 19m10s
- TotalThreadsSysTime: 13s240ms
- TotalThreadsUserTime: 21s668ms
- TotalThreadsVoluntaryContextSwitches: 10.96K (10957)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m8s
- ExecTreeExecTime: 2m4s
- OpenTime: 27.000ms
- ExecTreeOpenTime: 1.000ms
- PrepareTime: 21.000ms
- ExecTreePrepareTime: 1.000ms
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 166.00 B (166)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 8 (8)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 444.00 B (444)
CodeGen:(Total: 45.000ms, non-child: 45.000ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 19.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 2m4s, non-child: 2m4s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/107.89 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 233 out of 233
Hdfs Read Thread Concurrency Bucket: 0:8.171% 1:88.33% 2:3.113% 3:0% 4:0.3891%
File Formats: PARQUET/NONE:466
BytesRead(4s000ms): 27.80 MB, 85.14 MB, 142.67 MB, 207.51 MB, 265.22 MB, 318.69 MB, 375.42 MB, 427.20 MB, 488.50 MB, 538.53 MB, 589.01 MB, 643.03 MB, 694.78 MB, 745.85 MB, 799.44 MB, 869.70 MB, 939.54 MB, 1004.95 MB, 1.04 GB, 1.09 GB, 1.15 GB, 1.20 GB, 1.26 GB, 1.30 GB, 1.36 GB, 1.42 GB, 1.48 GB, 1.54 GB, 1.59 GB, 1.66 GB, 1.71 GB, 1.78 GB
- FooterProcessingTime: (Avg: 335.622ms ; Min: 2.000ms ; Max: 1s834ms ; Number of samples: 233)
- AverageHdfsReadThreadConcurrency: 0.96
- AverageScannerThreadConcurrency: 7.96
- BytesRead: 1.81 GB (1943785374)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.79 GB (1920996399)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.79 GB (1920996399)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.21K (1207)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 54.80 MB (57460818)
- PerReadThreadRawHdfsThroughput: 14.82 MB/sec
- RemoteScanRanges: 217 (217)
- RowBatchQueueGetWaitTime: 2m4s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 610.88M (610884328)
- RowsReturned: 8 (8)
- RowsReturnedRate: 0
- ScanRangesComplete: 233 (233)
- ScannerThreadsInvoluntaryContextSwitches: 48 (48)
- ScannerThreadsTotalWallClockTime: 17m1s
- MaterializeTupleTime(*): 28s089ms
- ScannerThreadsSysTime: 10s250ms
- ScannerThreadsUserTime: 20s926ms
- ScannerThreadsVoluntaryContextSwitches: 7.15K (7147)
- TotalRawHdfsReadTime(*): 2m5s
- TotalReadThroughput: 14.42 MB/sec
Instance d9451b2a703b10ea:3 (host=host.local:22000):(Total: 2m4s, non-child: 3s997ms, % non-child: 3.21%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/108.95 GB
MemoryUsage(2s000ms): 25.59 MB, 43.03 MB, 24.67 MB, 31.96 MB, 22.78 MB, 26.98 MB, 27.02 MB, 23.24 MB, 34.10 MB, 28.42 MB, 32.10 MB, 21.83 MB, 30.47 MB, 28.60 MB, 22.12 MB, 29.12 MB, 42.88 MB, 21.27 MB, 29.99 MB, 25.76 MB, 25.38 MB, 24.17 MB, 29.60 MB, 28.41 MB, 25.95 MB, 25.93 MB, 24.21 MB, 24.41 MB, 25.95 MB, 29.78 MB, 23.64 MB, 27.39 MB, 31.07 MB, 26.80 MB, 33.48 MB, 36.65 MB, 26.36 MB, 40.59 MB, 26.20 MB, 39.60 MB, 26.63 MB, 34.87 MB, 32.94 MB, 31.29 MB, 33.99 MB, 27.84 MB, 25.46 MB, 22.86 MB, 42.47 MB, 34.66 MB, 28.53 MB, 28.16 MB, 28.88 MB, 34.86 MB, 39.26 MB, 33.16 MB, 24.88 MB, 32.04 MB, 32.25 MB, 36.52 MB, 25.70 MB, 33.52 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 64.12 MB (67237897)
- PerHostPeakMemUsage: 64.13 MB (67250313)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m31s
- TotalThreadsInvoluntaryContextSwitches: 73 (73)
- TotalThreadsTotalWallClockTime: 18m33s
- TotalThreadsSysTime: 15s241ms
- TotalThreadsUserTime: 24s078ms
- TotalThreadsVoluntaryContextSwitches: 11.52K (11520)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m4s
- ExecTreeExecTime: 2m
- OpenTime: 29.000ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 20.000ms
- ExecTreePrepareTime: 1.000ms
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 48.000ms, non-child: 48.000ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 22.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 2m, non-child: 2m, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/108.95 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 232 out of 232
Hdfs Read Thread Concurrency Bucket: 0:4.418% 1:92.77% 2:2.008% 3:0.4016% 4:0.4016%
File Formats: PARQUET/NONE:464
BytesRead(2s000ms): 11.93 MB, 46.91 MB, 75.47 MB, 104.71 MB, 132.90 MB, 159.49 MB, 195.09 MB, 223.49 MB, 260.53 MB, 293.25 MB, 331.14 MB, 356.24 MB, 380.49 MB, 407.13 MB, 429.02 MB, 450.06 MB, 489.78 MB, 532.49 MB, 562.91 MB, 590.18 MB, 612.37 MB, 642.80 MB, 678.62 MB, 708.21 MB, 734.52 MB, 764.60 MB, 795.61 MB, 833.28 MB, 863.53 MB, 895.09 MB, 928.86 MB, 956.50 MB, 981.44 MB, 1012.94 MB, 1.03 GB, 1.06 GB, 1.10 GB, 1.13 GB, 1.16 GB, 1.20 GB, 1.23 GB, 1.27 GB, 1.30 GB, 1.33 GB, 1.37 GB, 1.40 GB, 1.43 GB, 1.46 GB, 1.49 GB, 1.53 GB, 1.56 GB, 1.58 GB, 1.62 GB, 1.65 GB, 1.69 GB, 1.71 GB, 1.74 GB, 1.77 GB, 1.80 GB, 1.82 GB, 1.86 GB, 1.90 GB
- FooterProcessingTime: (Avg: 294.689ms ; Min: 2.000ms ; Max: 1s752ms ; Number of samples: 232)
- AverageHdfsReadThreadConcurrency: 1.00
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.92 GB (2063038437)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.92 GB (2058780151)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.92 GB (2058780151)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.22K (1222)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 61.42 MB (64400766)
- PerReadThreadRawHdfsThroughput: 16.19 MB/sec
- RemoteScanRanges: 41 (41)
- RowBatchQueueGetWaitTime: 2m
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 622.54M (622540363)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 232 (232)
- ScannerThreadsInvoluntaryContextSwitches: 67 (67)
- ScannerThreadsTotalWallClockTime: 16m29s
- MaterializeTupleTime(*): 31s752ms
- ScannerThreadsSysTime: 12s047ms
- ScannerThreadsUserTime: 23s131ms
- ScannerThreadsVoluntaryContextSwitches: 7.40K (7404)
- TotalRawHdfsReadTime(*): 2m1s
- TotalReadThroughput: 15.80 MB/sec
Instance d9451b2a703b10ea:1 (host=host3.local:22000):(Total: 2m2s, non-child: 4s327ms, % non-child: 3.52%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/109.11 GB
MemoryUsage(2s000ms): 20.27 MB, 36.05 MB, 16.82 MB, 37.59 MB, 30.31 MB, 26.77 MB, 27.86 MB, 33.61 MB, 26.30 MB, 28.51 MB, 33.65 MB, 26.65 MB, 28.53 MB, 22.00 MB, 28.69 MB, 25.45 MB, 23.24 MB, 35.02 MB, 28.50 MB, 25.77 MB, 31.60 MB, 25.22 MB, 23.69 MB, 27.16 MB, 21.06 MB, 26.95 MB, 35.30 MB, 19.66 MB, 31.87 MB, 26.15 MB, 26.14 MB, 27.41 MB, 34.45 MB, 33.38 MB, 32.25 MB, 29.14 MB, 31.75 MB, 21.83 MB, 31.45 MB, 33.07 MB, 31.77 MB, 26.28 MB, 26.96 MB, 32.46 MB, 34.04 MB, 34.71 MB, 33.02 MB, 32.82 MB, 23.38 MB, 36.66 MB, 38.82 MB, 36.48 MB, 28.91 MB, 30.31 MB, 32.37 MB, 28.13 MB, 36.17 MB, 24.15 MB, 27.20 MB, 30.05 MB, 36.03 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.96
- BloomFilterBytes: 0
- PeakMemoryUsage: 60.35 MB (63283432)
- PerHostPeakMemUsage: 60.35 MB (63283432)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m16s
- TotalThreadsInvoluntaryContextSwitches: 89 (89)
- TotalThreadsTotalWallClockTime: 18m20s
- TotalThreadsSysTime: 16s903ms
- TotalThreadsUserTime: 25s652ms
- TotalThreadsVoluntaryContextSwitches: 10.99K (10989)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m2s
- ExecTreeExecTime: 1m58s
- OpenTime: 32.999ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 19.999ms
- ExecTreePrepareTime: 999.999us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 49.999ms, non-child: 49.999ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 8.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 22.999ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 17.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m58s, non-child: 1m58s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/109.11 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 232 out of 232
Hdfs Read Thread Concurrency Bucket: 0:7.317% 1:89.02% 2:3.252% 3:0.4065% 4:0%
File Formats: PARQUET/NONE:464
BytesRead(2s000ms): 10.81 MB, 47.09 MB, 69.94 MB, 101.26 MB, 136.16 MB, 163.45 MB, 193.85 MB, 229.89 MB, 270.82 MB, 300.56 MB, 337.38 MB, 371.32 MB, 408.27 MB, 442.79 MB, 476.09 MB, 511.15 MB, 541.57 MB, 578.83 MB, 607.60 MB, 641.31 MB, 683.05 MB, 711.09 MB, 736.45 MB, 771.75 MB, 805.29 MB, 833.93 MB, 861.38 MB, 888.27 MB, 915.50 MB, 946.55 MB, 977.73 MB, 1006.97 MB, 1.02 GB, 1.06 GB, 1.09 GB, 1.12 GB, 1.15 GB, 1.18 GB, 1.21 GB, 1.24 GB, 1.27 GB, 1.30 GB, 1.33 GB, 1.36 GB, 1.39 GB, 1.43 GB, 1.46 GB, 1.50 GB, 1.52 GB, 1.55 GB, 1.59 GB, 1.63 GB, 1.66 GB, 1.69 GB, 1.72 GB, 1.75 GB, 1.79 GB, 1.81 GB, 1.84 GB, 1.87 GB, 1.90 GB
- FooterProcessingTime: (Avg: 298.749ms ; Min: 1.999ms ; Max: 1s567ms ; Number of samples: 232)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.96
- BytesRead: 1.93 GB (2073824728)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.93 GB (2067526565)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.93 GB (2067526565)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.22K (1216)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.54 MB (61378909)
- PerReadThreadRawHdfsThroughput: 16.61 MB/sec
- RemoteScanRanges: 60 (60)
- RowBatchQueueGetWaitTime: 1m58s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 622.14M (622140244)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 232 (232)
- ScannerThreadsInvoluntaryContextSwitches: 78 (78)
- ScannerThreadsTotalWallClockTime: 16m17s
- MaterializeTupleTime(*): 34s311ms
- ScannerThreadsSysTime: 13s374ms
- ScannerThreadsUserTime: 24s648ms
- ScannerThreadsVoluntaryContextSwitches: 7.13K (7127)
- TotalRawHdfsReadTime(*): 1m59s
- TotalReadThroughput: 16.05 MB/sec
Instance d9451b2a703b10ea:5 (host=host2.local:22000):(Total: 1m58s, non-child: 3s923ms, % non-child: 3.30%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/109.09 GB
MemoryUsage(2s000ms): 26.86 MB, 38.67 MB, 20.91 MB, 23.20 MB, 30.55 MB, 33.23 MB, 32.38 MB, 27.99 MB, 29.41 MB, 27.31 MB, 28.17 MB, 25.18 MB, 26.16 MB, 25.07 MB, 22.63 MB, 32.02 MB, 24.45 MB, 32.23 MB, 26.54 MB, 22.92 MB, 27.61 MB, 29.19 MB, 30.77 MB, 29.01 MB, 31.00 MB, 28.45 MB, 27.06 MB, 25.67 MB, 28.82 MB, 29.03 MB, 32.36 MB, 27.41 MB, 25.43 MB, 37.01 MB, 28.27 MB, 31.88 MB, 29.63 MB, 30.96 MB, 29.69 MB, 37.05 MB, 29.90 MB, 25.39 MB, 31.32 MB, 29.52 MB, 31.78 MB, 31.69 MB, 32.83 MB, 30.29 MB, 28.75 MB, 27.28 MB, 31.85 MB, 34.23 MB, 28.71 MB, 27.12 MB, 30.69 MB, 38.23 MB, 28.43 MB, 30.10 MB, 28.36 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 61.43 MB (64412714)
- PerHostPeakMemUsage: 61.43 MB (64412714)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 14m45s
- TotalThreadsInvoluntaryContextSwitches: 64 (64)
- TotalThreadsTotalWallClockTime: 17m41s
- TotalThreadsSysTime: 15s453ms
- TotalThreadsUserTime: 23s275ms
- TotalThreadsVoluntaryContextSwitches: 10.98K (10979)
Fragment Instance Lifecycle Timings:
- ExecTime: 1m58s
- ExecTreeExecTime: 1m54s
- OpenTime: 25.999ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 19.999ms
- ExecTreePrepareTime: 999.998us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 44.999ms, non-child: 44.999ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 5.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 18.999ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m54s, non-child: 1m54s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/109.09 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 233 out of 233
Hdfs Read Thread Concurrency Bucket: 0:8.017% 1:88.19% 2:3.376% 3:0% 4:0.4219%
File Formats: PARQUET/NONE:466
BytesRead(2s000ms): 15.45 MB, 65.80 MB, 102.98 MB, 133.78 MB, 165.21 MB, 203.47 MB, 240.87 MB, 283.02 MB, 316.20 MB, 349.66 MB, 382.79 MB, 416.64 MB, 448.05 MB, 476.41 MB, 498.51 MB, 533.48 MB, 563.70 MB, 593.07 MB, 615.87 MB, 643.39 MB, 672.48 MB, 700.83 MB, 731.61 MB, 766.05 MB, 802.69 MB, 837.45 MB, 867.75 MB, 894.95 MB, 924.89 MB, 959.66 MB, 993.46 MB, 1.00 GB, 1.04 GB, 1.07 GB, 1.11 GB, 1.14 GB, 1.18 GB, 1.20 GB, 1.23 GB, 1.27 GB, 1.31 GB, 1.33 GB, 1.36 GB, 1.39 GB, 1.43 GB, 1.46 GB, 1.49 GB, 1.52 GB, 1.56 GB, 1.59 GB, 1.62 GB, 1.66 GB, 1.69 GB, 1.72 GB, 1.75 GB, 1.79 GB, 1.83 GB, 1.85 GB, 1.88 GB
- FooterProcessingTime: (Avg: 311.635ms ; Min: 1.999ms ; Max: 1s963ms ; Number of samples: 233)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.91 GB (2050873759)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.90 GB (2043531704)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.90 GB (2043531704)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.20K (1202)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 57.17 MB (59946439)
- PerReadThreadRawHdfsThroughput: 16.96 MB/sec
- RemoteScanRanges: 70 (70)
- RowBatchQueueGetWaitTime: 1m54s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 621.15M (621148641)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 233 (233)
- ScannerThreadsInvoluntaryContextSwitches: 62 (62)
- ScannerThreadsTotalWallClockTime: 15m42s
- MaterializeTupleTime(*): 31s204ms
- ScannerThreadsSysTime: 12s266ms
- ScannerThreadsUserTime: 22s397ms
- ScannerThreadsVoluntaryContextSwitches: 7.12K (7115)
- TotalRawHdfsReadTime(*): 1m55s
- TotalReadThroughput: 16.46 MB/sec
Instance d9451b2a703b10ea:4 (host=host5.local.local:22000):(Total: 1m49s, non-child: 2s994ms, % non-child: 2.74%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:181/85.20 GB
MemoryUsage(2s000ms): 18.21 MB, 38.53 MB, 27.08 MB, 26.53 MB, 32.62 MB, 30.29 MB, 34.66 MB, 29.49 MB, 26.05 MB, 31.58 MB, 25.94 MB, 25.67 MB, 27.96 MB, 24.87 MB, 23.64 MB, 25.64 MB, 32.55 MB, 24.18 MB, 23.78 MB, 24.56 MB, 17.69 MB, 31.04 MB, 36.53 MB, 27.76 MB, 30.91 MB, 29.22 MB, 27.76 MB, 33.72 MB, 31.42 MB, 31.44 MB, 43.47 MB, 25.10 MB, 33.64 MB, 33.70 MB, 31.30 MB, 33.17 MB, 27.18 MB, 28.23 MB, 31.66 MB, 37.66 MB, 33.80 MB, 25.78 MB, 34.78 MB, 31.03 MB, 30.89 MB, 35.60 MB, 25.06 MB, 32.50 MB, 32.21 MB, 29.77 MB, 33.05 MB, 37.72 MB, 26.45 MB, 31.92 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.93
- BloomFilterBytes: 0
- PeakMemoryUsage: 62.37 MB (65395653)
- PerHostPeakMemUsage: 62.37 MB (65395653)
- RowsProduced: 16 (16)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 13m38s
- TotalThreadsInvoluntaryContextSwitches: 59 (59)
- TotalThreadsTotalWallClockTime: 16m15s
- TotalThreadsSysTime: 11s556ms
- TotalThreadsUserTime: 18s833ms
- TotalThreadsVoluntaryContextSwitches: 8.79K (8792)
Fragment Instance Lifecycle Timings:
- ExecTime: 1m49s
- ExecTreeExecTime: 1m46s
- OpenTime: 27.000ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 20.000ms
- ExecTreePrepareTime: 1.000ms
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 304.00 B (304)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 16 (16)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 884.00 B (884)
CodeGen:(Total: 44.000ms, non-child: 44.000ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 19.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 1m46s, non-child: 1m46s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:181/85.20 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 181 out of 181
Hdfs Read Thread Concurrency Bucket: 0:5.936% 1:90.87% 2:2.74% 3:0.4566% 4:0%
File Formats: PARQUET/NONE:362
BytesRead(2s000ms): 8.09 MB, 40.45 MB, 69.02 MB, 90.99 MB, 123.35 MB, 149.26 MB, 175.30 MB, 204.01 MB, 226.35 MB, 252.84 MB, 289.11 MB, 315.11 MB, 341.50 MB, 366.53 MB, 388.34 MB, 408.49 MB, 431.86 MB, 458.16 MB, 480.48 MB, 501.06 MB, 521.84 MB, 547.29 MB, 580.90 MB, 605.54 MB, 635.80 MB, 667.92 MB, 703.79 MB, 736.70 MB, 764.60 MB, 798.15 MB, 841.56 MB, 867.56 MB, 889.01 MB, 917.82 MB, 948.27 MB, 982.54 MB, 1009.91 MB, 1.01 GB, 1.04 GB, 1.08 GB, 1.11 GB, 1.14 GB, 1.17 GB, 1.20 GB, 1.22 GB, 1.25 GB, 1.27 GB, 1.29 GB, 1.32 GB, 1.35 GB, 1.37 GB, 1.41 GB, 1.44 GB, 1.47 GB
- FooterProcessingTime: (Avg: 352.397ms ; Min: 5.000ms ; Max: 1s916ms ; Number of samples: 181)
- AverageHdfsReadThreadConcurrency: 0.98
- AverageScannerThreadConcurrency: 7.93
- BytesRead: 1.50 GB (1606182161)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.48 GB (1584301061)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.48 GB (1584301061)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 950 (950)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.71 MB (61563937)
- PerReadThreadRawHdfsThroughput: 14.50 MB/sec
- RemoteScanRanges: 208 (208)
- RowBatchQueueGetWaitTime: 1m46s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 484.59M (484593665)
- RowsReturned: 16 (16)
- RowsReturnedRate: 0
- ScanRangesComplete: 181 (181)
- ScannerThreadsInvoluntaryContextSwitches: 57 (57)
- ScannerThreadsTotalWallClockTime: 14m26s
- MaterializeTupleTime(*): 24s629ms
- ScannerThreadsSysTime: 9s060ms
- ScannerThreadsUserTime: 18s227ms
- ScannerThreadsVoluntaryContextSwitches: 5.73K (5731)
- TotalRawHdfsReadTime(*): 1m45s
- TotalReadThroughput: 13.98 MB/sec

Contributor

Hi Lars,

 

Have you had time to look at the profile?

 

Thanks

Shannon

 

 

Expert Contributor

I just had a look, but I couldn't spot an obvious problem. The HDFS scanner fragments read around 15 MB/s, which seems reasonable to me, given how computationally intensive Parquet decoding is. There also doesn't seem to be any considerable skew. Each of your 5 nodes reads ~ 100GB of data in 134s, so the overall throughput is around 764 MB/s.

 

I suggest to have a look at the perf improvements around Parquet files in CDH 5.12 that I mentioned in an earlier reply.

Contributor

Thanks Lars.

 

Shannon