Created on 03-03-2017 08:58 AM - edited 09-16-2022 04:11 AM
I ran the same query from impala-shell and DBvisualzer using the latest JDBC 4.1 (2.5.36) and it was 10x slower than the shell.
Is there anything I can do to improve the performance of the JDBC connector?
Is it anything to do with impala-shell using a Beeswax session type and JDBC using a HiveServer2 session type?
Created 03-07-2017 02:25 AM
I can see that from JDBC:
Start Time: 2017-03-03 11:38:07.269796000
End Time: 2017-03-03 11:38:08.508511000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table
and from impala-shell
Start Time: 2017-03-03 11:40:15.660573000
End Time: 2017-03-03 11:40:15.806061000Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table
I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.
What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.
Created 03-03-2017 12:15 PM
Created on 03-06-2017 02:05 AM - edited 03-06-2017 02:08 AM
I got these from Cloudera Manager.
JDBC:
Query (id=3e40a60f2dfc2554:bac26d70201f8d9c)
Summary
Session ID: f042aa249717205e:ae98b27fcfcd0783
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-03-03 11:38:07.269796000
End Time: 2017-03-03 11:38:08.508511000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b)
User:
Connected User:
Delegated User:
Network Address: ::ffff:1.2.3.4:50034
Default Db: default
Sql Statement: <QUERY>
Coordinator: server1.example.com:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=768.00MB VCores=1
01:EXCHANGE [UNPARTITIONED]
| hosts=1 per-host-mem=unavailable
| tuple-ids=0 row-size=614B cardinality=1
|
00:SCAN HDFS [default.impala_table, RANDOM]
partitions=1/13999 files=1 size=132.88KB
predicates: (default.impala_table.acct_id = 660)
table stats: 760514828 rows total
column stats: all
hosts=1 per-host-mem=768.00MB
tuple-ids=0 row-size=614B cardinality=1
----------------
Estimated Per-Host Mem: 805306368
Estimated Per-Host VCores: 1
Request Pool: root.default
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED
00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table
Planner Timeline
Analysis finished: 9883819
Equivalence classes computed: 13302816
Single node plan created: 16281907
Runtime filters computed: 16321655
Distributed plan created: 18076216
Lineage info computed: 19909717
Planning finished: 26827613
Query Timeline
Start execution: 0
Planning finished: 30000364
Submit for admission: 31000376
Completed admission: 32000388
Ready to start 1 remote fragments: 33000400
All 1 remote fragments started: 39000472
Rows available: 54000656
First row fetched: 759009252
Unregister query: 1239015104
ImpalaServer
- ClientFetchWaitTimer: 1185014448
- InactiveTotalTime: 0
- RowMaterializationTimer: 0
- TotalTime: 0
Execution Profile 3e40a60f2dfc2554:bac26d70201f8d9c
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed
-------------------------------------------------------------------------------------------------------------------
Fragment start latencies: Count: 1, 25th %-ile: 5ms, 50th %-ile: 5ms, 75th %-ile: 5ms, 90th %-ile: 5ms, 95th %-ile: 5ms, 99.9th %-ile: 5ms
Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0)
- FiltersReceived: 0
- FinalizationTimer: 0
- InactiveTotalTime: 0
- TotalTime: 22000268
Coordinator Fragment F01
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 12296
- PerHostPeakMemUsage: 0
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 712008680
- TotalNetworkReceiveTime: 15000184
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 0
- TotalTime: 16000196
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
EXCHANGE_NODE (id=1)
- BytesReceived: 441
- ConvertRowBatchTime: 0
- DeserializeRowBatchTimer: 0
- FirstBatchArrivalWaitTime: 15000184
- InactiveTotalTime: 15000184
- PeakMemoryUsage: 0
- RowsReturned: 1
- RowsReturnedRate: 66
- SendersBlockedTimer: 0
- SendersBlockedTotalTimer(*): 0
- TotalTime: 15000184
Averaged Fragment F00
split sizes: min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0
completion times: min:17.000ms max:17.000ms mean: 17.000ms stddev:0.000ns
execution rates: min:7.63 MB/sec max:7.63 MB/sec mean:7.63 MB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 1644840
- PerHostPeakMemUsage: 1644840
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 22999884
- TotalNetworkReceiveTime: 0
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 16999908
- TotalTime: 20999892
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
DataStreamSender (dst_id=1)
- BytesSent: 441
- InactiveTotalTime: 0
- NetworkThroughput(*): 0
- OverallThroughput: 441001
- PeakMemoryUsage: 296
- RowsReturned: 1
- SerializeBatchTime: 999996
- TotalTime: 999996
- TransmitDataRPCTime: 0
- UncompressedRowBatchSize: 612
HDFS_SCAN_NODE (id=0)
- AverageHdfsReadThreadConcurrency: 0.0
- AverageScannerThreadConcurrency: 0.0
- BytesRead: 199843
- BytesReadDataNodeCache: 0
- BytesReadLocal: 199843
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 199843
- DecompressionTime: 0
- InactiveTotalTime: 0
- MaxCompressedTextFileLength: 0
- NumColumns: 48
- NumDisksAccessed: 1
- NumRowGroups: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 1636352
- PerReadThreadRawHdfsThroughput: 0
- RemoteScanRanges: 0
- RowsRead: 760
- RowsReturned: 1
- RowsReturnedRate: 55
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 18999900
- MaterializeTupleTime(*): 0
- ScannerThreadsSysTime: 0
- ScannerThreadsUserTime: 2624000
- ScannerThreadsVoluntaryContextSwitches: 30
- TotalRawHdfsReadTime(*): 0
- TotalReadThroughput: 0
- TotalTime: 17999904
Fragment F00
Instance 3e40a60f2dfc2554:bac26d70201f8d9e (host=server8.example.com:22000)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 1644840
- PerHostPeakMemUsage: 1644840
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 22999884
- TotalNetworkReceiveTime: 0
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 16999908
- TotalTime: 20999892
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
DataStreamSender (dst_id=1)
- BytesSent: 441
- InactiveTotalTime: 0
- NetworkThroughput(*): 0
- OverallThroughput: 441001
- PeakMemoryUsage: 296
- RowsReturned: 1
- SerializeBatchTime: 999996
- TotalTime: 999996
- TransmitDataRPCTime: 0
- UncompressedRowBatchSize: 612
HDFS_SCAN_NODE (id=0)
ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB
Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0%
File Formats: PARQUET/SNAPPY:48
- AverageHdfsReadThreadConcurrency: 0.0
- AverageScannerThreadConcurrency: 0.0
- BytesRead: 199843
- BytesReadDataNodeCache: 0
- BytesReadLocal: 199843
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 199843
- DecompressionTime: 0
- InactiveTotalTime: 0
- MaxCompressedTextFileLength: 0
- NumColumns: 48
- NumDisksAccessed: 1
- NumRowGroups: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 1636352
- PerReadThreadRawHdfsThroughput: 0
- RemoteScanRanges: 0
- RowsRead: 760
- RowsReturned: 1
- RowsReturnedRate: 55
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 18999900
- MaterializeTupleTime(*): 0
- ScannerThreadsSysTime: 0
- ScannerThreadsUserTime: 2624000
- ScannerThreadsVoluntaryContextSwitches: 30
- TotalRawHdfsReadTime(*): 0
- TotalReadThroughput: 0
- TotalTime: 17999904Impala-shell:
Query (id=494fccf3229a725c:40217c9c3c50929a)
Summary
Session ID: 454fbcd23796af47:82ab459a4077e5ab
Session Type: BEESWAX
Start Time: 2017-03-03 11:40:15.660573000
End Time: 2017-03-03 11:40:15.806061000
Query Type: QUERY
Query State: FINISHED
Query Status: OK
Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b)
User: impala_user
Connected User: impala_user
Delegated User:
Network Address: ::ffff:4.3.2.1:55960
Default Db: default
Sql Statement: <QUERY>
Coordinator: server1.example.com:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=768.00MB VCores=1
01:EXCHANGE [UNPARTITIONED]
| hosts=1 per-host-mem=unavailable
| tuple-ids=0 row-size=614B cardinality=1
|
00:SCAN HDFS [default.impala_table, RANDOM]
partitions=1/13999 files=1 size=132.88KB
predicates: (default.impala_table.acct_id = 660)
table stats: 760514828 rows total
column stats: all
hosts=1 per-host-mem=768.00MB
tuple-ids=0 row-size=614B cardinality=1
----------------
Estimated Per-Host Mem: 805306368
Estimated Per-Host VCores: 1
Request Pool: root.impala_user
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED
00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table
Planner Timeline
Analysis finished: 9689649
Equivalence classes computed: 15036979
Single node plan created: 19349122
Runtime filters computed: 19426476
Distributed plan created: 21875782
Lineage info computed: 24636383
Planning finished: 33188255
Query Timeline
Start execution: 0
Planning finished: 36000448
Submit for admission: 37000464
Completed admission: 37000464
Ready to start 1 remote fragments: 38000476
All 1 remote fragments started: 40000500
Rows available: 64000800
First row fetched: 142001772
Unregister query: 145001808
ImpalaServer
- ClientFetchWaitTimer: 81001008
- InactiveTotalTime: 0
- RowMaterializationTimer: 0
- TotalTime: 0
Execution Profile 494fccf3229a725c:40217c9c3c50929a
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed
-------------------------------------------------------------------------------------------------------------------
Fragment start latencies: Count: 1, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms
Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0)
- FiltersReceived: 0
- FinalizationTimer: 0
- InactiveTotalTime: 0
- TotalTime: 27000336
Coordinator Fragment F01
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 12296
- PerHostPeakMemUsage: 0
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 80000996
- TotalNetworkReceiveTime: 24000300
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 0
- TotalTime: 24000300
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
EXCHANGE_NODE (id=1)
- BytesReceived: 441
- ConvertRowBatchTime: 0
- DeserializeRowBatchTimer: 0
- FirstBatchArrivalWaitTime: 24000300
- InactiveTotalTime: 24000300
- PeakMemoryUsage: 0
- RowsReturned: 1
- RowsReturnedRate: 41
- SendersBlockedTimer: 0
- SendersBlockedTotalTimer(*): 0
- TotalTime: 24000300
Averaged Fragment F00
split sizes: min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0
completion times: min:26.000ms max:26.000ms mean: 26.000ms stddev:0.000ns
execution rates: min:4.99 MB/sec max:4.99 MB/sec mean:4.99 MB/sec stddev:0.00 /sec
num instances: 1
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 1644840
- PerHostPeakMemUsage: 1644840
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 25999864
- TotalNetworkReceiveTime: 0
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 22999880
- TotalTime: 25999864
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
DataStreamSender (dst_id=1)
- BytesSent: 441
- InactiveTotalTime: 0
- NetworkThroughput(*): 0
- OverallThroughput: 0
- PeakMemoryUsage: 296
- RowsReturned: 1
- SerializeBatchTime: 0
- TotalTime: 0
- TransmitDataRPCTime: 0
- UncompressedRowBatchSize: 612
HDFS_SCAN_NODE (id=0)
- AverageHdfsReadThreadConcurrency: 0.0
- AverageScannerThreadConcurrency: 0.0
- BytesRead: 199843
- BytesReadDataNodeCache: 0
- BytesReadLocal: 199843
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 199843
- DecompressionTime: 0
- InactiveTotalTime: 0
- MaxCompressedTextFileLength: 0
- NumColumns: 48
- NumDisksAccessed: 1
- NumRowGroups: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 1636352
- PerReadThreadRawHdfsThroughput: 0
- RemoteScanRanges: 0
- RowsRead: 760
- RowsReturned: 1
- RowsReturnedRate: 41
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 22999880
- MaterializeTupleTime(*): 0
- ScannerThreadsSysTime: 2775000
- ScannerThreadsUserTime: 0
- ScannerThreadsVoluntaryContextSwitches: 35
- TotalRawHdfsReadTime(*): 0
- TotalReadThroughput: 0
- TotalTime: 23999876
Fragment F00
Instance 494fccf3229a725c:40217c9c3c50929c (host=server8.example.com:22000)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB
- AverageThreadTokens: 0.0
- BloomFilterBytes: 0
- InactiveTotalTime: 0
- PeakMemoryUsage: 1644840
- PerHostPeakMemUsage: 1644840
- PrepareTime: 0
- RowsProduced: 1
- TotalCpuTime: 25999864
- TotalNetworkReceiveTime: 0
- TotalNetworkSendTime: 0
- TotalStorageWaitTime: 22999880
- TotalTime: 25999864
BlockMgr
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- InactiveTotalTime: 0
- MaxBlockSize: 8388608
- MemoryLimit: 4294967296
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0
- TotalEncryptionTime: 0
- TotalIntegrityCheckTime: 0
- TotalReadBlockTime: 0
- TotalTime: 0
DataStreamSender (dst_id=1)
- BytesSent: 441
- InactiveTotalTime: 0
- NetworkThroughput(*): 0
- OverallThroughput: 0
- PeakMemoryUsage: 296
- RowsReturned: 1
- SerializeBatchTime: 0
- TotalTime: 0
- TransmitDataRPCTime: 0
- UncompressedRowBatchSize: 612
HDFS_SCAN_NODE (id=0)
ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB
Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0%
File Formats: PARQUET/SNAPPY:48
- AverageHdfsReadThreadConcurrency: 0.0
- AverageScannerThreadConcurrency: 0.0
- BytesRead: 199843
- BytesReadDataNodeCache: 0
- BytesReadLocal: 199843
- BytesReadRemoteUnexpected: 0
- BytesReadShortCircuit: 199843
- DecompressionTime: 0
- InactiveTotalTime: 0
- MaxCompressedTextFileLength: 0
- NumColumns: 48
- NumDisksAccessed: 1
- NumRowGroups: 1
- NumScannerThreadsStarted: 1
- PeakMemoryUsage: 1636352
- PerReadThreadRawHdfsThroughput: 0
- RemoteScanRanges: 0
- RowsRead: 760
- RowsReturned: 1
- RowsReturnedRate: 41
- ScanRangesComplete: 1
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 22999880
- MaterializeTupleTime(*): 0
- ScannerThreadsSysTime: 2775000
- ScannerThreadsUserTime: 0
- ScannerThreadsVoluntaryContextSwitches: 35
- TotalRawHdfsReadTime(*): 0
- TotalReadThroughput: 0
- TotalTime: 23999876
Created 03-07-2017 02:25 AM
I can see that from JDBC:
Start Time: 2017-03-03 11:38:07.269796000
End Time: 2017-03-03 11:38:08.508511000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table
and from impala-shell
Start Time: 2017-03-03 11:40:15.660573000
End Time: 2017-03-03 11:40:15.806061000Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table
I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.
What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.
Created on 04-19-2018 03:42 AM - edited 04-19-2018 04:26 AM
Hello,
I am having a similar problem.
We have some processing tasks to run on Impala, which require some custom functionalities. The processing consists of selecting some columns, process the value and insert the result in a new table (or column).
There are 2 options that we are looking at: UDF (Java) to be used natively in Impala, and JDBC connection from Java to Impala to select, process and insert record sets from a Java application. The latter option was introduced to reduce the complexity of the Impala queries but also to evaluate its performance.
Some initial experimentation on 1M rows showed that UDFs are dramatically faster (~10x) than the corresponding JDBC version (which however is still single-threaded).
The test environment features 2 worker nodes (2 Impala daemons), but the JDBC version uses only one coordinator.
Is there any value to spend time to investigate the performance of the JDBC version? Will it ever match the UDF one?
Thank you,
Gerasimos
Created 04-19-2018 04:27 AM
Take a look to this:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-ODBC-JDBC-bad-performance-rows-f...
Good luck.
Created 03-06-2017 02:25 AM
If the PROFILES showed the same performance, then you can turn on the debug logs for JDBC to see which part it was slow, add the following to the JDBC connection string:
;LogLevel=6;LogPath=/path/to/directory
Then examine the debug log create under the directory after you run the query.
Created on 03-06-2017 03:04 AM - edited 03-06-2017 03:17 AM
Thanks EricL.
Plenty to dig into with these logs, quick wins so far:
I'm going to dig into the additional JDBC overhead.
EDIT:
A show tables statement is executed everytime prior to the SQL execution, adding a 1 sec delay.