Reply
Highlighted
Cloudera Employee
Posts: 69
Registered: ‎12-07-2015

Re: Partition, file size and number of files question

The files shouldn't be too many. Impala processes files in parallel locally, too, so you should see a higher utilization on each node. Can you post a profile of one of the slow queries?

Explorer
Posts: 104
Registered: ‎09-14-2016

Re: Partition, file size and number of files question

[ Edited ]

Posted here, thank you for helping out.

 

-----------------------------------------------------------

 

Query (id=d9451b2a703b10ea:7b8ec0b200000000):
Summary:
Session ID: 3c45d712aca91b57:cdd15e04d5ca5997
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-08-11 13:03:23.398808000
End Time: 2017-08-11 13:05:37.772410000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.8.0-cdh5.11.1 RELEASE (build 3382c1c488dff12d5ca8d049d2b59babee605b4e)
User: root
Connected User: root
Delegated User:
Network Address: 10.10.16.7:47208
Default Db: default
Sql Statement: select id from image_review_image_data_v3 v
where transactionid ='304641412'
Coordinator: host.local:22000
Query Options (non default): QUERY_TIMEOUT_S=600
Plan:
----------------
Estimated Per-Host Requirements: Memory=176.00MB VCores=1

PLAN-ROOT SINK
|
01:EXCHANGE [UNPARTITIONED]
| hosts=5 per-host-mem=unavailable
| tuple-ids=0 row-size=50B cardinality=28
|
00:SCAN HDFS [default.image_review_image_data_v3 v, RANDOM]
partitions=18/18 files=138 size=520.24GB
predicates: transactionid = '304641412'
table stats: 2961307241 rows total
column stats: all
hosts=5 per-host-mem=176.00MB
tuple-ids=0 row-size=50B cardinality=28
----------------
Estimated Per-Host Mem: 184549376
Estimated Per-Host VCores: 1
Request Pool: root.root
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE 1 0.000ns 0.000ns 24 28 0 -1.00 B UNPARTITIONED
00:SCAN HDFS 5 1m56s 2m4s 24 28 61.42 MB 176.00 MB default.image_review_image_...
Errors:
Planner Timeline: 3.336ms
- Analysis finished: 655.820us (655.820us)
- Equivalence classes computed: 735.594us (79.774us)
- Single node plan created: 1.969ms (1.234ms)
- Runtime filters computed: 2.068ms (98.686us)
- Distributed plan created: 2.155ms (87.078us)
- Lineage info computed: 2.201ms (46.370us)
- Planning finished: 3.336ms (1.134ms)
Query Timeline: 2m14s
- Query submitted: 0.000ns (0.000ns)
- Planning finished: 5s431ms (5s431ms)
- Submit for admission: 5s435ms (4.000ms)
- Completed admission: 5s435ms (0.000ns)
- Ready to start 6 fragment instances: 5s437ms (2.000ms)
- All 6 fragment instances started: 5s447ms (10.000ms)
- Rows available: 11s084ms (5s637ms)
- First row fetched: 12s195ms (1s111ms)
- Unregister query: 2m14s (2m2s)
- ComputeScanRangeAssignmentTimer: 2.000ms
ImpalaServer:
- ClientFetchWaitTimer: 1s658ms
- RowMaterializationTimer: 2m1s
Execution Profile d9451b2a703b10ea:7b8ec0b200000000:(Total: 2m7s, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------

Fragment instance start latencies: Count: 6, 25th %-ile: 2ms, 50th %-ile: 2ms, 75th %-ile: 4ms, 90th %-ile: 4ms, 95th %-ile: 8ms, 99.9th %-ile: 8ms
Per Node Peak Memory Usage: host2.local:22000(61.43 MB) host5.local.local:22000(62.37 MB) host4.local.local:22000(56.81 MB) host3.local:22000(60.35 MB) host.local:22000(64.13 MB)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:
Instance d9451b2a703b10ea:7b8ec0b200000000 (host=host.local:22000):(Total: 2m8s, non-child: 1.000ms, % non-child: 0.00%)
MemoryUsage(4s000ms): 8.00 KB, 10.58 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 12.12 KB, 13.60 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB, 16.06 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 16.19 KB (16576)
- PerHostPeakMemUsage: 64.13 MB (67250313)
- RowsProduced: 24 (24)
- TotalNetworkReceiveTime: 2m8s
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
- TotalThreadsInvoluntaryContextSwitches: 0 (0)
- TotalThreadsTotalWallClockTime: 2m8s
- TotalThreadsSysTime: 0.000ns
- TotalThreadsUserTime: 1.091ms
- TotalThreadsVoluntaryContextSwitches: 6 (6)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m2s
- ExecTreeExecTime: 2m2s
- OpenTime: 5s626ms
- ExecTreeOpenTime: 5s625ms
- PrepareTime: 19.000ms
- ExecTreePrepareTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
PLAN_ROOT_SINK:
- PeakMemoryUsage: 0
CodeGen:(Total: 19.000ms, non-child: 19.000ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 0.000ns
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 0 (0)
- NumInstructions: 0 (0)
- OptimizationTime: 0.000ns
- PeakMemoryUsage: 0
- PrepareTime: 19.000ms
EXCHANGE_NODE (id=1):(Total: 2m8s, non-child: 0.000ns, % non-child: 0.00%)
BytesReceived(4s000ms): 0, 190.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 304.00 B, 366.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B, 470.00 B
- BytesReceived: 470.00 B (470)
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 5s625ms
- PeakMemoryUsage: 0
- RowsReturned: 24 (24)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:(Total: 2m, non-child: 3s765ms, % non-child: 3.12%)
split sizes: min: 85.20 GB, max: 109.11 GB, avg: 104.05 GB, stddev: 9.43 GB
completion times: min:1m49s max:2m8s mean: 2m stddev:6s502ms
execution rates: min:798.37 MB/sec max:940.70 MB/sec mean:881.14 MB/sec stddev:48.68 MB/sec
num instances: 5
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 61.02 MB (63980069)
- PerHostPeakMemUsage: 61.02 MB (63982552)
- RowsProduced: 4 (4)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m3s
- TotalThreadsInvoluntaryContextSwitches: 68 (68)
- TotalThreadsTotalWallClockTime: 18m
- TotalThreadsSysTime: 14s479ms
- TotalThreadsUserTime: 22s701ms
- TotalThreadsVoluntaryContextSwitches: 10.65K (10647)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m
- ExecTreeExecTime: 1m56s
- OpenTime: 28.399ms
- ExecTreeOpenTime: 200.000us
- PrepareTime: 20.199ms
- ExecTreePrepareTime: 999.999us
DataStreamSender (dst_id=1):
- BytesSent: 94.00 B (94)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 4 (4)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 265.00 B (265)
CodeGen:(Total: 46.399ms, non-child: 46.399ms, % non-child: 100.00%)
- CodegenTime: 800.000us
- CompileTime: 6.599ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 20.399ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 18.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m56s, non-child: 1m56s, % non-child: 100.00%)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.81 GB (1947540891)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.80 GB (1935027176)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.80 GB (1935027176)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.16K (1159)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.13 MB (60950173)
- PerReadThreadRawHdfsThroughput: 15.82 MB/sec
- RemoteScanRanges: 119 (119)
- RowBatchQueueGetWaitTime: 1m56s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 592.26M (592261448)
- RowsReturned: 4 (4)
- RowsReturnedRate: 0
- ScanRangesComplete: 222 (222)
- ScannerThreadsInvoluntaryContextSwitches: 62 (62)
- ScannerThreadsTotalWallClockTime: 15m59s
- MaterializeTupleTime(*): 29s997ms
- ScannerThreadsSysTime: 11s399ms
- ScannerThreadsUserTime: 21s866ms
- ScannerThreadsVoluntaryContextSwitches: 6.90K (6904)
- TotalRawHdfsReadTime(*): 1m57s
- TotalReadThroughput: 15.34 MB/sec
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
Fragment F00:
Instance d9451b2a703b10ea:2 (host=host4.local.local:22000):(Total: 2m8s, non-child: 3s584ms, % non-child: 2.79%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/107.89 GB
MemoryUsage(4s000ms): 28.26 MB, 25.90 MB, 24.97 MB, 30.79 MB, 26.85 MB, 21.63 MB, 28.63 MB, 29.20 MB, 27.14 MB, 23.07 MB, 28.53 MB, 22.50 MB, 27.40 MB, 26.63 MB, 30.06 MB, 30.50 MB, 29.07 MB, 27.06 MB, 28.04 MB, 34.46 MB, 26.72 MB, 33.66 MB, 28.47 MB, 28.68 MB, 31.87 MB, 27.85 MB, 38.58 MB, 33.31 MB, 29.26 MB, 30.06 MB, 31.92 MB, 29.39 MB
ThreadUsage(4s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.96
- BloomFilterBytes: 0
- PeakMemoryUsage: 56.81 MB (59570649)
- PerHostPeakMemUsage: 56.81 MB (59570649)
- RowsProduced: 8 (8)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 16m5s
- TotalThreadsInvoluntaryContextSwitches: 55 (55)
- TotalThreadsTotalWallClockTime: 19m10s
- TotalThreadsSysTime: 13s240ms
- TotalThreadsUserTime: 21s668ms
- TotalThreadsVoluntaryContextSwitches: 10.96K (10957)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m8s
- ExecTreeExecTime: 2m4s
- OpenTime: 27.000ms
- ExecTreeOpenTime: 1.000ms
- PrepareTime: 21.000ms
- ExecTreePrepareTime: 1.000ms
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 166.00 B (166)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 8 (8)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 444.00 B (444)
CodeGen:(Total: 45.000ms, non-child: 45.000ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 19.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 2m4s, non-child: 2m4s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/107.89 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 233 out of 233
Hdfs Read Thread Concurrency Bucket: 0:8.171% 1:88.33% 2:3.113% 3:0% 4:0.3891%
File Formats: PARQUET/NONE:466
BytesRead(4s000ms): 27.80 MB, 85.14 MB, 142.67 MB, 207.51 MB, 265.22 MB, 318.69 MB, 375.42 MB, 427.20 MB, 488.50 MB, 538.53 MB, 589.01 MB, 643.03 MB, 694.78 MB, 745.85 MB, 799.44 MB, 869.70 MB, 939.54 MB, 1004.95 MB, 1.04 GB, 1.09 GB, 1.15 GB, 1.20 GB, 1.26 GB, 1.30 GB, 1.36 GB, 1.42 GB, 1.48 GB, 1.54 GB, 1.59 GB, 1.66 GB, 1.71 GB, 1.78 GB
- FooterProcessingTime: (Avg: 335.622ms ; Min: 2.000ms ; Max: 1s834ms ; Number of samples: 233)
- AverageHdfsReadThreadConcurrency: 0.96
- AverageScannerThreadConcurrency: 7.96
- BytesRead: 1.81 GB (1943785374)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.79 GB (1920996399)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.79 GB (1920996399)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.21K (1207)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 54.80 MB (57460818)
- PerReadThreadRawHdfsThroughput: 14.82 MB/sec
- RemoteScanRanges: 217 (217)
- RowBatchQueueGetWaitTime: 2m4s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 610.88M (610884328)
- RowsReturned: 8 (8)
- RowsReturnedRate: 0
- ScanRangesComplete: 233 (233)
- ScannerThreadsInvoluntaryContextSwitches: 48 (48)
- ScannerThreadsTotalWallClockTime: 17m1s
- MaterializeTupleTime(*): 28s089ms
- ScannerThreadsSysTime: 10s250ms
- ScannerThreadsUserTime: 20s926ms
- ScannerThreadsVoluntaryContextSwitches: 7.15K (7147)
- TotalRawHdfsReadTime(*): 2m5s
- TotalReadThroughput: 14.42 MB/sec
Instance d9451b2a703b10ea:3 (host=host.local:22000):(Total: 2m4s, non-child: 3s997ms, % non-child: 3.21%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/108.95 GB
MemoryUsage(2s000ms): 25.59 MB, 43.03 MB, 24.67 MB, 31.96 MB, 22.78 MB, 26.98 MB, 27.02 MB, 23.24 MB, 34.10 MB, 28.42 MB, 32.10 MB, 21.83 MB, 30.47 MB, 28.60 MB, 22.12 MB, 29.12 MB, 42.88 MB, 21.27 MB, 29.99 MB, 25.76 MB, 25.38 MB, 24.17 MB, 29.60 MB, 28.41 MB, 25.95 MB, 25.93 MB, 24.21 MB, 24.41 MB, 25.95 MB, 29.78 MB, 23.64 MB, 27.39 MB, 31.07 MB, 26.80 MB, 33.48 MB, 36.65 MB, 26.36 MB, 40.59 MB, 26.20 MB, 39.60 MB, 26.63 MB, 34.87 MB, 32.94 MB, 31.29 MB, 33.99 MB, 27.84 MB, 25.46 MB, 22.86 MB, 42.47 MB, 34.66 MB, 28.53 MB, 28.16 MB, 28.88 MB, 34.86 MB, 39.26 MB, 33.16 MB, 24.88 MB, 32.04 MB, 32.25 MB, 36.52 MB, 25.70 MB, 33.52 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 64.12 MB (67237897)
- PerHostPeakMemUsage: 64.13 MB (67250313)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m31s
- TotalThreadsInvoluntaryContextSwitches: 73 (73)
- TotalThreadsTotalWallClockTime: 18m33s
- TotalThreadsSysTime: 15s241ms
- TotalThreadsUserTime: 24s078ms
- TotalThreadsVoluntaryContextSwitches: 11.52K (11520)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m4s
- ExecTreeExecTime: 2m
- OpenTime: 29.000ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 20.000ms
- ExecTreePrepareTime: 1.000ms
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 48.000ms, non-child: 48.000ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 22.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 2m, non-child: 2m, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/108.95 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 232 out of 232
Hdfs Read Thread Concurrency Bucket: 0:4.418% 1:92.77% 2:2.008% 3:0.4016% 4:0.4016%
File Formats: PARQUET/NONE:464
BytesRead(2s000ms): 11.93 MB, 46.91 MB, 75.47 MB, 104.71 MB, 132.90 MB, 159.49 MB, 195.09 MB, 223.49 MB, 260.53 MB, 293.25 MB, 331.14 MB, 356.24 MB, 380.49 MB, 407.13 MB, 429.02 MB, 450.06 MB, 489.78 MB, 532.49 MB, 562.91 MB, 590.18 MB, 612.37 MB, 642.80 MB, 678.62 MB, 708.21 MB, 734.52 MB, 764.60 MB, 795.61 MB, 833.28 MB, 863.53 MB, 895.09 MB, 928.86 MB, 956.50 MB, 981.44 MB, 1012.94 MB, 1.03 GB, 1.06 GB, 1.10 GB, 1.13 GB, 1.16 GB, 1.20 GB, 1.23 GB, 1.27 GB, 1.30 GB, 1.33 GB, 1.37 GB, 1.40 GB, 1.43 GB, 1.46 GB, 1.49 GB, 1.53 GB, 1.56 GB, 1.58 GB, 1.62 GB, 1.65 GB, 1.69 GB, 1.71 GB, 1.74 GB, 1.77 GB, 1.80 GB, 1.82 GB, 1.86 GB, 1.90 GB
- FooterProcessingTime: (Avg: 294.689ms ; Min: 2.000ms ; Max: 1s752ms ; Number of samples: 232)
- AverageHdfsReadThreadConcurrency: 1.00
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.92 GB (2063038437)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.92 GB (2058780151)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.92 GB (2058780151)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.22K (1222)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 61.42 MB (64400766)
- PerReadThreadRawHdfsThroughput: 16.19 MB/sec
- RemoteScanRanges: 41 (41)
- RowBatchQueueGetWaitTime: 2m
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 622.54M (622540363)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 232 (232)
- ScannerThreadsInvoluntaryContextSwitches: 67 (67)
- ScannerThreadsTotalWallClockTime: 16m29s
- MaterializeTupleTime(*): 31s752ms
- ScannerThreadsSysTime: 12s047ms
- ScannerThreadsUserTime: 23s131ms
- ScannerThreadsVoluntaryContextSwitches: 7.40K (7404)
- TotalRawHdfsReadTime(*): 2m1s
- TotalReadThroughput: 15.80 MB/sec
Instance d9451b2a703b10ea:1 (host=host3.local:22000):(Total: 2m2s, non-child: 4s327ms, % non-child: 3.52%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/109.11 GB
MemoryUsage(2s000ms): 20.27 MB, 36.05 MB, 16.82 MB, 37.59 MB, 30.31 MB, 26.77 MB, 27.86 MB, 33.61 MB, 26.30 MB, 28.51 MB, 33.65 MB, 26.65 MB, 28.53 MB, 22.00 MB, 28.69 MB, 25.45 MB, 23.24 MB, 35.02 MB, 28.50 MB, 25.77 MB, 31.60 MB, 25.22 MB, 23.69 MB, 27.16 MB, 21.06 MB, 26.95 MB, 35.30 MB, 19.66 MB, 31.87 MB, 26.15 MB, 26.14 MB, 27.41 MB, 34.45 MB, 33.38 MB, 32.25 MB, 29.14 MB, 31.75 MB, 21.83 MB, 31.45 MB, 33.07 MB, 31.77 MB, 26.28 MB, 26.96 MB, 32.46 MB, 34.04 MB, 34.71 MB, 33.02 MB, 32.82 MB, 23.38 MB, 36.66 MB, 38.82 MB, 36.48 MB, 28.91 MB, 30.31 MB, 32.37 MB, 28.13 MB, 36.17 MB, 24.15 MB, 27.20 MB, 30.05 MB, 36.03 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.96
- BloomFilterBytes: 0
- PeakMemoryUsage: 60.35 MB (63283432)
- PerHostPeakMemUsage: 60.35 MB (63283432)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 15m16s
- TotalThreadsInvoluntaryContextSwitches: 89 (89)
- TotalThreadsTotalWallClockTime: 18m20s
- TotalThreadsSysTime: 16s903ms
- TotalThreadsUserTime: 25s652ms
- TotalThreadsVoluntaryContextSwitches: 10.99K (10989)
Fragment Instance Lifecycle Timings:
- ExecTime: 2m2s
- ExecTreeExecTime: 1m58s
- OpenTime: 32.999ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 19.999ms
- ExecTreePrepareTime: 999.999us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 49.999ms, non-child: 49.999ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 8.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 22.999ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 17.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m58s, non-child: 1m58s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:232/109.11 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 232 out of 232
Hdfs Read Thread Concurrency Bucket: 0:7.317% 1:89.02% 2:3.252% 3:0.4065% 4:0%
File Formats: PARQUET/NONE:464
BytesRead(2s000ms): 10.81 MB, 47.09 MB, 69.94 MB, 101.26 MB, 136.16 MB, 163.45 MB, 193.85 MB, 229.89 MB, 270.82 MB, 300.56 MB, 337.38 MB, 371.32 MB, 408.27 MB, 442.79 MB, 476.09 MB, 511.15 MB, 541.57 MB, 578.83 MB, 607.60 MB, 641.31 MB, 683.05 MB, 711.09 MB, 736.45 MB, 771.75 MB, 805.29 MB, 833.93 MB, 861.38 MB, 888.27 MB, 915.50 MB, 946.55 MB, 977.73 MB, 1006.97 MB, 1.02 GB, 1.06 GB, 1.09 GB, 1.12 GB, 1.15 GB, 1.18 GB, 1.21 GB, 1.24 GB, 1.27 GB, 1.30 GB, 1.33 GB, 1.36 GB, 1.39 GB, 1.43 GB, 1.46 GB, 1.50 GB, 1.52 GB, 1.55 GB, 1.59 GB, 1.63 GB, 1.66 GB, 1.69 GB, 1.72 GB, 1.75 GB, 1.79 GB, 1.81 GB, 1.84 GB, 1.87 GB, 1.90 GB
- FooterProcessingTime: (Avg: 298.749ms ; Min: 1.999ms ; Max: 1s567ms ; Number of samples: 232)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.96
- BytesRead: 1.93 GB (2073824728)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.93 GB (2067526565)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.93 GB (2067526565)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.22K (1216)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.54 MB (61378909)
- PerReadThreadRawHdfsThroughput: 16.61 MB/sec
- RemoteScanRanges: 60 (60)
- RowBatchQueueGetWaitTime: 1m58s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 622.14M (622140244)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 232 (232)
- ScannerThreadsInvoluntaryContextSwitches: 78 (78)
- ScannerThreadsTotalWallClockTime: 16m17s
- MaterializeTupleTime(*): 34s311ms
- ScannerThreadsSysTime: 13s374ms
- ScannerThreadsUserTime: 24s648ms
- ScannerThreadsVoluntaryContextSwitches: 7.13K (7127)
- TotalRawHdfsReadTime(*): 1m59s
- TotalReadThroughput: 16.05 MB/sec
Instance d9451b2a703b10ea:5 (host=host2.local:22000):(Total: 1m58s, non-child: 3s923ms, % non-child: 3.30%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/109.09 GB
MemoryUsage(2s000ms): 26.86 MB, 38.67 MB, 20.91 MB, 23.20 MB, 30.55 MB, 33.23 MB, 32.38 MB, 27.99 MB, 29.41 MB, 27.31 MB, 28.17 MB, 25.18 MB, 26.16 MB, 25.07 MB, 22.63 MB, 32.02 MB, 24.45 MB, 32.23 MB, 26.54 MB, 22.92 MB, 27.61 MB, 29.19 MB, 30.77 MB, 29.01 MB, 31.00 MB, 28.45 MB, 27.06 MB, 25.67 MB, 28.82 MB, 29.03 MB, 32.36 MB, 27.41 MB, 25.43 MB, 37.01 MB, 28.27 MB, 31.88 MB, 29.63 MB, 30.96 MB, 29.69 MB, 37.05 MB, 29.90 MB, 25.39 MB, 31.32 MB, 29.52 MB, 31.78 MB, 31.69 MB, 32.83 MB, 30.29 MB, 28.75 MB, 27.28 MB, 31.85 MB, 34.23 MB, 28.71 MB, 27.12 MB, 30.69 MB, 38.23 MB, 28.43 MB, 30.10 MB, 28.36 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7
- AverageThreadTokens: 8.95
- BloomFilterBytes: 0
- PeakMemoryUsage: 61.43 MB (64412714)
- PerHostPeakMemUsage: 61.43 MB (64412714)
- RowsProduced: 0 (0)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 14m45s
- TotalThreadsInvoluntaryContextSwitches: 64 (64)
- TotalThreadsTotalWallClockTime: 17m41s
- TotalThreadsSysTime: 15s453ms
- TotalThreadsUserTime: 23s275ms
- TotalThreadsVoluntaryContextSwitches: 10.98K (10979)
Fragment Instance Lifecycle Timings:
- ExecTime: 1m58s
- ExecTreeExecTime: 1m54s
- OpenTime: 25.999ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 19.999ms
- ExecTreePrepareTime: 999.998us
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 0
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 0 (0)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 0
CodeGen:(Total: 44.999ms, non-child: 44.999ms, % non-child: 100.00%)
- CodegenTime: 1.000ms
- CompileTime: 5.999ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 18.999ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.999ms
HDFS_SCAN_NODE (id=0):(Total: 1m54s, non-child: 1m54s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:233/109.09 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 233 out of 233
Hdfs Read Thread Concurrency Bucket: 0:8.017% 1:88.19% 2:3.376% 3:0% 4:0.4219%
File Formats: PARQUET/NONE:466
BytesRead(2s000ms): 15.45 MB, 65.80 MB, 102.98 MB, 133.78 MB, 165.21 MB, 203.47 MB, 240.87 MB, 283.02 MB, 316.20 MB, 349.66 MB, 382.79 MB, 416.64 MB, 448.05 MB, 476.41 MB, 498.51 MB, 533.48 MB, 563.70 MB, 593.07 MB, 615.87 MB, 643.39 MB, 672.48 MB, 700.83 MB, 731.61 MB, 766.05 MB, 802.69 MB, 837.45 MB, 867.75 MB, 894.95 MB, 924.89 MB, 959.66 MB, 993.46 MB, 1.00 GB, 1.04 GB, 1.07 GB, 1.11 GB, 1.14 GB, 1.18 GB, 1.20 GB, 1.23 GB, 1.27 GB, 1.31 GB, 1.33 GB, 1.36 GB, 1.39 GB, 1.43 GB, 1.46 GB, 1.49 GB, 1.52 GB, 1.56 GB, 1.59 GB, 1.62 GB, 1.66 GB, 1.69 GB, 1.72 GB, 1.75 GB, 1.79 GB, 1.83 GB, 1.85 GB, 1.88 GB
- FooterProcessingTime: (Avg: 311.635ms ; Min: 1.999ms ; Max: 1s963ms ; Number of samples: 233)
- AverageHdfsReadThreadConcurrency: 0.97
- AverageScannerThreadConcurrency: 7.95
- BytesRead: 1.91 GB (2050873759)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.90 GB (2043531704)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.90 GB (2043531704)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 1.20K (1202)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 57.17 MB (59946439)
- PerReadThreadRawHdfsThroughput: 16.96 MB/sec
- RemoteScanRanges: 70 (70)
- RowBatchQueueGetWaitTime: 1m54s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 621.15M (621148641)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- ScanRangesComplete: 233 (233)
- ScannerThreadsInvoluntaryContextSwitches: 62 (62)
- ScannerThreadsTotalWallClockTime: 15m42s
- MaterializeTupleTime(*): 31s204ms
- ScannerThreadsSysTime: 12s266ms
- ScannerThreadsUserTime: 22s397ms
- ScannerThreadsVoluntaryContextSwitches: 7.12K (7115)
- TotalRawHdfsReadTime(*): 1m55s
- TotalReadThroughput: 16.46 MB/sec
Instance d9451b2a703b10ea:4 (host=host5.local.local:22000):(Total: 1m49s, non-child: 2s994ms, % non-child: 2.74%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:181/85.20 GB
MemoryUsage(2s000ms): 18.21 MB, 38.53 MB, 27.08 MB, 26.53 MB, 32.62 MB, 30.29 MB, 34.66 MB, 29.49 MB, 26.05 MB, 31.58 MB, 25.94 MB, 25.67 MB, 27.96 MB, 24.87 MB, 23.64 MB, 25.64 MB, 32.55 MB, 24.18 MB, 23.78 MB, 24.56 MB, 17.69 MB, 31.04 MB, 36.53 MB, 27.76 MB, 30.91 MB, 29.22 MB, 27.76 MB, 33.72 MB, 31.42 MB, 31.44 MB, 43.47 MB, 25.10 MB, 33.64 MB, 33.70 MB, 31.30 MB, 33.17 MB, 27.18 MB, 28.23 MB, 31.66 MB, 37.66 MB, 33.80 MB, 25.78 MB, 34.78 MB, 31.03 MB, 30.89 MB, 35.60 MB, 25.06 MB, 32.50 MB, 32.21 MB, 29.77 MB, 33.05 MB, 37.72 MB, 26.45 MB, 31.92 MB
ThreadUsage(2s000ms): 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8
- AverageThreadTokens: 8.93
- BloomFilterBytes: 0
- PeakMemoryUsage: 62.37 MB (65395653)
- PerHostPeakMemUsage: 62.37 MB (65395653)
- RowsProduced: 16 (16)
- TotalNetworkReceiveTime: 0.000ns
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 13m38s
- TotalThreadsInvoluntaryContextSwitches: 59 (59)
- TotalThreadsTotalWallClockTime: 16m15s
- TotalThreadsSysTime: 11s556ms
- TotalThreadsUserTime: 18s833ms
- TotalThreadsVoluntaryContextSwitches: 8.79K (8792)
Fragment Instance Lifecycle Timings:
- ExecTime: 1m49s
- ExecTreeExecTime: 1m46s
- OpenTime: 27.000ms
- ExecTreeOpenTime: 0.000ns
- PrepareTime: 20.000ms
- ExecTreePrepareTime: 1.000ms
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 17.60 GB (18897856512)
- PeakMemoryUsage: 0
- ScratchFileUsedBytes: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalReadBlockTime: 0.000ns
DataStreamSender (dst_id=1):
- BytesSent: 304.00 B (304)
- NetworkThroughput(*): 0.00 /sec
- OverallThroughput: 0.00 /sec
- PeakMemoryUsage: 3.88 KB (3968)
- RowsReturned: 16 (16)
- SerializeBatchTime: 0.000ns
- TransmitDataRPCTime: 0.000ns
- UncompressedRowBatchSize: 884.00 B (884)
CodeGen:(Total: 44.000ms, non-child: 44.000ms, % non-child: 100.00%)
- CodegenTime: 0.000ns
- CompileTime: 6.000ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.97 MB (2070236)
- NumFunctions: 18 (18)
- NumInstructions: 289 (289)
- OptimizationTime: 19.000ms
- PeakMemoryUsage: 144.50 KB (147968)
- PrepareTime: 19.000ms
HDFS_SCAN_NODE (id=0):(Total: 1m46s, non-child: 1m46s, % non-child: 100.00%)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:181/85.20 GB
ExecOption: PARQUET Codegen Enabled, Codegen enabled: 181 out of 181
Hdfs Read Thread Concurrency Bucket: 0:5.936% 1:90.87% 2:2.74% 3:0.4566% 4:0%
File Formats: PARQUET/NONE:362
BytesRead(2s000ms): 8.09 MB, 40.45 MB, 69.02 MB, 90.99 MB, 123.35 MB, 149.26 MB, 175.30 MB, 204.01 MB, 226.35 MB, 252.84 MB, 289.11 MB, 315.11 MB, 341.50 MB, 366.53 MB, 388.34 MB, 408.49 MB, 431.86 MB, 458.16 MB, 480.48 MB, 501.06 MB, 521.84 MB, 547.29 MB, 580.90 MB, 605.54 MB, 635.80 MB, 667.92 MB, 703.79 MB, 736.70 MB, 764.60 MB, 798.15 MB, 841.56 MB, 867.56 MB, 889.01 MB, 917.82 MB, 948.27 MB, 982.54 MB, 1009.91 MB, 1.01 GB, 1.04 GB, 1.08 GB, 1.11 GB, 1.14 GB, 1.17 GB, 1.20 GB, 1.22 GB, 1.25 GB, 1.27 GB, 1.29 GB, 1.32 GB, 1.35 GB, 1.37 GB, 1.41 GB, 1.44 GB, 1.47 GB
- FooterProcessingTime: (Avg: 352.397ms ; Min: 5.000ms ; Max: 1s916ms ; Number of samples: 181)
- AverageHdfsReadThreadConcurrency: 0.98
- AverageScannerThreadConcurrency: 7.93
- BytesRead: 1.50 GB (1606182161)
- BytesReadDataNodeCache: 0
- BytesReadLocal: 1.48 GB (1584301061)
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 1.48 GB (1584301061)
- DecompressionTime: 0.000ns
- MaxCompressedTextFileLength: 0
- NumColumns: 2 (2)
- NumDisksAccessed: 2 (2)
- NumRowGroups: 950 (950)
- NumScannerThreadsStarted: 8 (8)
- PeakMemoryUsage: 58.71 MB (61563937)
- PerReadThreadRawHdfsThroughput: 14.50 MB/sec
- RemoteScanRanges: 208 (208)
- RowBatchQueueGetWaitTime: 1m46s
- RowBatchQueuePutWaitTime: 0.000ns
- RowsRead: 484.59M (484593665)
- RowsReturned: 16 (16)
- RowsReturnedRate: 0
- ScanRangesComplete: 181 (181)
- ScannerThreadsInvoluntaryContextSwitches: 57 (57)
- ScannerThreadsTotalWallClockTime: 14m26s
- MaterializeTupleTime(*): 24s629ms
- ScannerThreadsSysTime: 9s060ms
- ScannerThreadsUserTime: 18s227ms
- ScannerThreadsVoluntaryContextSwitches: 5.73K (5731)
- TotalRawHdfsReadTime(*): 1m45s
- TotalReadThroughput: 13.98 MB/sec

Explorer
Posts: 104
Registered: ‎09-14-2016

Re: Partition, file size and number of files question

Hi Lars,

 

Have you had time to look at the profile?

 

Thanks

Shannon

 

 

Cloudera Employee
Posts: 69
Registered: ‎12-07-2015

Re: Partition, file size and number of files question

I just had a look, but I couldn't spot an obvious problem. The HDFS scanner fragments read around 15 MB/s, which seems reasonable to me, given how computationally intensive Parquet decoding is. There also doesn't seem to be any considerable skew. Each of your 5 nodes reads ~ 100GB of data in 134s, so the overall throughput is around 764 MB/s.

 

I suggest to have a look at the perf improvements around Parquet files in CDH 5.12 that I mentioned in an earlier reply.

Explorer
Posts: 104
Registered: ‎09-14-2016

Re: Partition, file size and number of files question

Thanks Lars.

 

Shannon

Announcements