Created on 03-03-2017 08:58 AM - edited 09-16-2022 04:11 AM
I ran the same query from impala-shell and DBvisualzer using the latest JDBC 4.1 (2.5.36) and it was 10x slower than the shell.
Is there anything I can do to improve the performance of the JDBC connector?
Is it anything to do with impala-shell using a Beeswax session type and JDBC using a HiveServer2 session type?
Created 03-07-2017 02:25 AM
I can see that from JDBC:
Start Time: 2017-03-03 11:38:07.269796000 End Time: 2017-03-03 11:38:08.508511000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table
and from impala-shell
Start Time: 2017-03-03 11:40:15.660573000 End Time: 2017-03-03 11:40:15.806061000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table
I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.
What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.
Created 03-03-2017 12:15 PM
Created on 03-06-2017 02:05 AM - edited 03-06-2017 02:08 AM
I got these from Cloudera Manager.
JDBC:
Query (id=3e40a60f2dfc2554:bac26d70201f8d9c) Summary Session ID: f042aa249717205e:ae98b27fcfcd0783 Session Type: HIVESERVER2 HiveServer2 Protocol Version: V6 Start Time: 2017-03-03 11:38:07.269796000 End Time: 2017-03-03 11:38:08.508511000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b) User: Connected User: Delegated User: Network Address: ::ffff:1.2.3.4:50034 Default Db: default Sql Statement: <QUERY> Coordinator: server1.example.com:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=768.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=1 per-host-mem=unavailable | tuple-ids=0 row-size=614B cardinality=1 | 00:SCAN HDFS [default.impala_table, RANDOM] partitions=1/13999 files=1 size=132.88KB predicates: (default.impala_table.acct_id = 660) table stats: 760514828 rows total column stats: all hosts=1 per-host-mem=768.00MB tuple-ids=0 row-size=614B cardinality=1 ---------------- Estimated Per-Host Mem: 805306368 Estimated Per-Host VCores: 1 Request Pool: root.default Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table Planner Timeline Analysis finished: 9883819 Equivalence classes computed: 13302816 Single node plan created: 16281907 Runtime filters computed: 16321655 Distributed plan created: 18076216 Lineage info computed: 19909717 Planning finished: 26827613 Query Timeline Start execution: 0 Planning finished: 30000364 Submit for admission: 31000376 Completed admission: 32000388 Ready to start 1 remote fragments: 33000400 All 1 remote fragments started: 39000472 Rows available: 54000656 First row fetched: 759009252 Unregister query: 1239015104 ImpalaServer - ClientFetchWaitTimer: 1185014448 - InactiveTotalTime: 0 - RowMaterializationTimer: 0 - TotalTime: 0 Execution Profile 3e40a60f2dfc2554:bac26d70201f8d9c Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed ------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 1, 25th %-ile: 5ms, 50th %-ile: 5ms, 75th %-ile: 5ms, 90th %-ile: 5ms, 95th %-ile: 5ms, 99.9th %-ile: 5ms Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0) - FiltersReceived: 0 - FinalizationTimer: 0 - InactiveTotalTime: 0 - TotalTime: 22000268 Coordinator Fragment F01 - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 12296 - PerHostPeakMemUsage: 0 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 712008680 - TotalNetworkReceiveTime: 15000184 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 0 - TotalTime: 16000196 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 EXCHANGE_NODE (id=1) - BytesReceived: 441 - ConvertRowBatchTime: 0 - DeserializeRowBatchTimer: 0 - FirstBatchArrivalWaitTime: 15000184 - InactiveTotalTime: 15000184 - PeakMemoryUsage: 0 - RowsReturned: 1 - RowsReturnedRate: 66 - SendersBlockedTimer: 0 - SendersBlockedTotalTimer(*): 0 - TotalTime: 15000184 Averaged Fragment F00 split sizes: min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0 completion times: min:17.000ms max:17.000ms mean: 17.000ms stddev:0.000ns execution rates: min:7.63 MB/sec max:7.63 MB/sec mean:7.63 MB/sec stddev:0.00 /sec num instances: 1 - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 1644840 - PerHostPeakMemUsage: 1644840 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 22999884 - TotalNetworkReceiveTime: 0 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 16999908 - TotalTime: 20999892 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 DataStreamSender (dst_id=1) - BytesSent: 441 - InactiveTotalTime: 0 - NetworkThroughput(*): 0 - OverallThroughput: 441001 - PeakMemoryUsage: 296 - RowsReturned: 1 - SerializeBatchTime: 999996 - TotalTime: 999996 - TransmitDataRPCTime: 0 - UncompressedRowBatchSize: 612 HDFS_SCAN_NODE (id=0) - AverageHdfsReadThreadConcurrency: 0.0 - AverageScannerThreadConcurrency: 0.0 - BytesRead: 199843 - BytesReadDataNodeCache: 0 - BytesReadLocal: 199843 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 199843 - DecompressionTime: 0 - InactiveTotalTime: 0 - MaxCompressedTextFileLength: 0 - NumColumns: 48 - NumDisksAccessed: 1 - NumRowGroups: 1 - NumScannerThreadsStarted: 1 - PeakMemoryUsage: 1636352 - PerReadThreadRawHdfsThroughput: 0 - RemoteScanRanges: 0 - RowsRead: 760 - RowsReturned: 1 - RowsReturnedRate: 55 - ScanRangesComplete: 1 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsTotalWallClockTime: 18999900 - MaterializeTupleTime(*): 0 - ScannerThreadsSysTime: 0 - ScannerThreadsUserTime: 2624000 - ScannerThreadsVoluntaryContextSwitches: 30 - TotalRawHdfsReadTime(*): 0 - TotalReadThroughput: 0 - TotalTime: 17999904 Fragment F00 Instance 3e40a60f2dfc2554:bac26d70201f8d9e (host=server8.example.com:22000) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 1644840 - PerHostPeakMemUsage: 1644840 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 22999884 - TotalNetworkReceiveTime: 0 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 16999908 - TotalTime: 20999892 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 DataStreamSender (dst_id=1) - BytesSent: 441 - InactiveTotalTime: 0 - NetworkThroughput(*): 0 - OverallThroughput: 441001 - PeakMemoryUsage: 296 - RowsReturned: 1 - SerializeBatchTime: 999996 - TotalTime: 999996 - TransmitDataRPCTime: 0 - UncompressedRowBatchSize: 612 HDFS_SCAN_NODE (id=0) ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% File Formats: PARQUET/SNAPPY:48 - AverageHdfsReadThreadConcurrency: 0.0 - AverageScannerThreadConcurrency: 0.0 - BytesRead: 199843 - BytesReadDataNodeCache: 0 - BytesReadLocal: 199843 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 199843 - DecompressionTime: 0 - InactiveTotalTime: 0 - MaxCompressedTextFileLength: 0 - NumColumns: 48 - NumDisksAccessed: 1 - NumRowGroups: 1 - NumScannerThreadsStarted: 1 - PeakMemoryUsage: 1636352 - PerReadThreadRawHdfsThroughput: 0 - RemoteScanRanges: 0 - RowsRead: 760 - RowsReturned: 1 - RowsReturnedRate: 55 - ScanRangesComplete: 1 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsTotalWallClockTime: 18999900 - MaterializeTupleTime(*): 0 - ScannerThreadsSysTime: 0 - ScannerThreadsUserTime: 2624000 - ScannerThreadsVoluntaryContextSwitches: 30 - TotalRawHdfsReadTime(*): 0 - TotalReadThroughput: 0 - TotalTime: 17999904
Impala-shell:
Query (id=494fccf3229a725c:40217c9c3c50929a) Summary Session ID: 454fbcd23796af47:82ab459a4077e5ab Session Type: BEESWAX Start Time: 2017-03-03 11:40:15.660573000 End Time: 2017-03-03 11:40:15.806061000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b) User: impala_user Connected User: impala_user Delegated User: Network Address: ::ffff:4.3.2.1:55960 Default Db: default Sql Statement: <QUERY> Coordinator: server1.example.com:22000 Query Options (non default): Plan: ---------------- Estimated Per-Host Requirements: Memory=768.00MB VCores=1 01:EXCHANGE [UNPARTITIONED] | hosts=1 per-host-mem=unavailable | tuple-ids=0 row-size=614B cardinality=1 | 00:SCAN HDFS [default.impala_table, RANDOM] partitions=1/13999 files=1 size=132.88KB predicates: (default.impala_table.acct_id = 660) table stats: 760514828 rows total column stats: all hosts=1 per-host-mem=768.00MB tuple-ids=0 row-size=614B cardinality=1 ---------------- Estimated Per-Host Mem: 805306368 Estimated Per-Host VCores: 1 Request Pool: root.impala_user Admission result: Admitted immediately ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table Planner Timeline Analysis finished: 9689649 Equivalence classes computed: 15036979 Single node plan created: 19349122 Runtime filters computed: 19426476 Distributed plan created: 21875782 Lineage info computed: 24636383 Planning finished: 33188255 Query Timeline Start execution: 0 Planning finished: 36000448 Submit for admission: 37000464 Completed admission: 37000464 Ready to start 1 remote fragments: 38000476 All 1 remote fragments started: 40000500 Rows available: 64000800 First row fetched: 142001772 Unregister query: 145001808 ImpalaServer - ClientFetchWaitTimer: 81001008 - InactiveTotalTime: 0 - RowMaterializationTimer: 0 - TotalTime: 0 Execution Profile 494fccf3229a725c:40217c9c3c50929a Number of filters: 0 Filter routing table: ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed ------------------------------------------------------------------------------------------------------------------- Fragment start latencies: Count: 1, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0) - FiltersReceived: 0 - FinalizationTimer: 0 - InactiveTotalTime: 0 - TotalTime: 27000336 Coordinator Fragment F01 - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 12296 - PerHostPeakMemUsage: 0 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 80000996 - TotalNetworkReceiveTime: 24000300 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 0 - TotalTime: 24000300 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 EXCHANGE_NODE (id=1) - BytesReceived: 441 - ConvertRowBatchTime: 0 - DeserializeRowBatchTimer: 0 - FirstBatchArrivalWaitTime: 24000300 - InactiveTotalTime: 24000300 - PeakMemoryUsage: 0 - RowsReturned: 1 - RowsReturnedRate: 41 - SendersBlockedTimer: 0 - SendersBlockedTotalTimer(*): 0 - TotalTime: 24000300 Averaged Fragment F00 split sizes: min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0 completion times: min:26.000ms max:26.000ms mean: 26.000ms stddev:0.000ns execution rates: min:4.99 MB/sec max:4.99 MB/sec mean:4.99 MB/sec stddev:0.00 /sec num instances: 1 - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 1644840 - PerHostPeakMemUsage: 1644840 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 25999864 - TotalNetworkReceiveTime: 0 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 22999880 - TotalTime: 25999864 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 DataStreamSender (dst_id=1) - BytesSent: 441 - InactiveTotalTime: 0 - NetworkThroughput(*): 0 - OverallThroughput: 0 - PeakMemoryUsage: 296 - RowsReturned: 1 - SerializeBatchTime: 0 - TotalTime: 0 - TransmitDataRPCTime: 0 - UncompressedRowBatchSize: 612 HDFS_SCAN_NODE (id=0) - AverageHdfsReadThreadConcurrency: 0.0 - AverageScannerThreadConcurrency: 0.0 - BytesRead: 199843 - BytesReadDataNodeCache: 0 - BytesReadLocal: 199843 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 199843 - DecompressionTime: 0 - InactiveTotalTime: 0 - MaxCompressedTextFileLength: 0 - NumColumns: 48 - NumDisksAccessed: 1 - NumRowGroups: 1 - NumScannerThreadsStarted: 1 - PeakMemoryUsage: 1636352 - PerReadThreadRawHdfsThroughput: 0 - RemoteScanRanges: 0 - RowsRead: 760 - RowsReturned: 1 - RowsReturnedRate: 41 - ScanRangesComplete: 1 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsTotalWallClockTime: 22999880 - MaterializeTupleTime(*): 0 - ScannerThreadsSysTime: 2775000 - ScannerThreadsUserTime: 0 - ScannerThreadsVoluntaryContextSwitches: 35 - TotalRawHdfsReadTime(*): 0 - TotalReadThroughput: 0 - TotalTime: 23999876 Fragment F00 Instance 494fccf3229a725c:40217c9c3c50929c (host=server8.example.com:22000) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB - AverageThreadTokens: 0.0 - BloomFilterBytes: 0 - InactiveTotalTime: 0 - PeakMemoryUsage: 1644840 - PerHostPeakMemUsage: 1644840 - PrepareTime: 0 - RowsProduced: 1 - TotalCpuTime: 25999864 - TotalNetworkReceiveTime: 0 - TotalNetworkSendTime: 0 - TotalStorageWaitTime: 22999880 - TotalTime: 25999864 BlockMgr - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 4294967296 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0 DataStreamSender (dst_id=1) - BytesSent: 441 - InactiveTotalTime: 0 - NetworkThroughput(*): 0 - OverallThroughput: 0 - PeakMemoryUsage: 296 - RowsReturned: 1 - SerializeBatchTime: 0 - TotalTime: 0 - TransmitDataRPCTime: 0 - UncompressedRowBatchSize: 612 HDFS_SCAN_NODE (id=0) ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1 Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% File Formats: PARQUET/SNAPPY:48 - AverageHdfsReadThreadConcurrency: 0.0 - AverageScannerThreadConcurrency: 0.0 - BytesRead: 199843 - BytesReadDataNodeCache: 0 - BytesReadLocal: 199843 - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 199843 - DecompressionTime: 0 - InactiveTotalTime: 0 - MaxCompressedTextFileLength: 0 - NumColumns: 48 - NumDisksAccessed: 1 - NumRowGroups: 1 - NumScannerThreadsStarted: 1 - PeakMemoryUsage: 1636352 - PerReadThreadRawHdfsThroughput: 0 - RemoteScanRanges: 0 - RowsRead: 760 - RowsReturned: 1 - RowsReturnedRate: 41 - ScanRangesComplete: 1 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsTotalWallClockTime: 22999880 - MaterializeTupleTime(*): 0 - ScannerThreadsSysTime: 2775000 - ScannerThreadsUserTime: 0 - ScannerThreadsVoluntaryContextSwitches: 35 - TotalRawHdfsReadTime(*): 0 - TotalReadThroughput: 0 - TotalTime: 23999876
Created 03-07-2017 02:25 AM
I can see that from JDBC:
Start Time: 2017-03-03 11:38:07.269796000 End Time: 2017-03-03 11:38:08.508511000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 17.999ms 17.999ms 1 1 1.56 MB 768.00 MB default.impala_table
and from impala-shell
Start Time: 2017-03-03 11:40:15.660573000 End Time: 2017-03-03 11:40:15.806061000
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------- 01:EXCHANGE 1 0.000ns 0.000ns 1 1 0 -1.00 B UNPARTITIONED 00:SCAN HDFS 1 23.999ms 23.999ms 1 1 1.56 MB 768.00 MB default.impala_table
I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.
What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.
Created on 04-19-2018 03:42 AM - edited 04-19-2018 04:26 AM
Hello,
I am having a similar problem.
We have some processing tasks to run on Impala, which require some custom functionalities. The processing consists of selecting some columns, process the value and insert the result in a new table (or column).
There are 2 options that we are looking at: UDF (Java) to be used natively in Impala, and JDBC connection from Java to Impala to select, process and insert record sets from a Java application. The latter option was introduced to reduce the complexity of the Impala queries but also to evaluate its performance.
Some initial experimentation on 1M rows showed that UDFs are dramatically faster (~10x) than the corresponding JDBC version (which however is still single-threaded).
The test environment features 2 worker nodes (2 Impala daemons), but the JDBC version uses only one coordinator.
Is there any value to spend time to investigate the performance of the JDBC version? Will it ever match the UDF one?
Thank you,
Gerasimos
Created 04-19-2018 04:27 AM
Take a look to this:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-ODBC-JDBC-bad-performance-rows-f...
Good luck.
Created 03-06-2017 02:25 AM
If the PROFILES showed the same performance, then you can turn on the debug logs for JDBC to see which part it was slow, add the following to the JDBC connection string:
;LogLevel=6;LogPath=/path/to/directory
Then examine the debug log create under the directory after you run the query.
Created on 03-06-2017 03:04 AM - edited 03-06-2017 03:17 AM
Thanks EricL.
Plenty to dig into with these logs, quick wins so far:
I'm going to dig into the additional JDBC overhead.
EDIT:
A show tables statement is executed everytime prior to the SQL execution, adding a 1 sec delay.