Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala JDBC 10x Slower Vs. Shell

avatar
Explorer

I ran the same query from impala-shell and DBvisualzer using the latest JDBC 4.1 (2.5.36) and it was 10x slower than the shell.

Is there anything I can do to improve the performance of the JDBC connector?

 

Is it anything to do with impala-shell using a Beeswax session type and JDBC using a HiveServer2 session type? 

1 ACCEPTED SOLUTION

avatar
Super Guru

I can see that from JDBC:

Start Time: 2017-03-03 11:38:07.269796000
    End Time: 2017-03-03 11:38:08.508511000

Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  17.999ms  17.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 
 

 

and from impala-shell

    Start Time: 2017-03-03 11:40:15.660573000
    End Time: 2017-03-03 11:40:15.806061000
Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  23.999ms  23.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 

 

I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.

 

What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.

View solution in original post

7 REPLIES 7

avatar
Super Collaborator
Hi Lennond,

can you post the profiles for both queries?

Thanks, Lars

avatar
Explorer

I got these from Cloudera Manager.

 

JDBC:

Query (id=3e40a60f2dfc2554:bac26d70201f8d9c)
  Summary
    Session ID: f042aa249717205e:ae98b27fcfcd0783
    Session Type: HIVESERVER2
    HiveServer2 Protocol Version: V6
    Start Time: 2017-03-03 11:38:07.269796000
    End Time: 2017-03-03 11:38:08.508511000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b)
    User: 
    Connected User: 
    Delegated User: 
    Network Address: ::ffff:1.2.3.4:50034
    Default Db: default
    Sql Statement: <QUERY>
    Coordinator: server1.example.com:22000
    Query Options (non default): 
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=768.00MB VCores=1

01:EXCHANGE [UNPARTITIONED]
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=0 row-size=614B cardinality=1
|
00:SCAN HDFS [default.impala_table, RANDOM]
   partitions=1/13999 files=1 size=132.88KB
   predicates: (default.impala_table.acct_id = 660)
   table stats: 760514828 rows total
   column stats: all
   hosts=1 per-host-mem=768.00MB
   tuple-ids=0 row-size=614B cardinality=1
----------------
    Estimated Per-Host Mem: 805306368
    Estimated Per-Host VCores: 1
    Request Pool: root.default
    Admission result: Admitted immediately
    ExecSummary: 
Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  17.999ms  17.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 
    Planner Timeline
      Analysis finished: 9883819
      Equivalence classes computed: 13302816
      Single node plan created: 16281907
      Runtime filters computed: 16321655
      Distributed plan created: 18076216
      Lineage info computed: 19909717
      Planning finished: 26827613
    Query Timeline
      Start execution: 0
      Planning finished: 30000364
      Submit for admission: 31000376
      Completed admission: 32000388
      Ready to start 1 remote fragments: 33000400
      All 1 remote fragments started: 39000472
      Rows available: 54000656
      First row fetched: 759009252
      Unregister query: 1239015104
  ImpalaServer
    - ClientFetchWaitTimer: 1185014448
    - InactiveTotalTime: 0
    - RowMaterializationTimer: 0
    - TotalTime: 0
  Execution Profile 3e40a60f2dfc2554:bac26d70201f8d9c
    Number of filters: 0
    Filter routing table: 
 ID  Src. Node  Tgt. Node(s)  Targets  Target type  Partition filter  Pending (Expected)  First arrived   Completed
-------------------------------------------------------------------------------------------------------------------

    Fragment start latencies: Count: 1, 25th %-ile: 5ms, 50th %-ile: 5ms, 75th %-ile: 5ms, 90th %-ile: 5ms, 95th %-ile: 5ms, 99.9th %-ile: 5ms
    Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0) 
    - FiltersReceived: 0
    - FinalizationTimer: 0
    - InactiveTotalTime: 0
    - TotalTime: 22000268
    Coordinator Fragment F01
      - AverageThreadTokens: 0.0
      - BloomFilterBytes: 0
      - InactiveTotalTime: 0
      - PeakMemoryUsage: 12296
      - PerHostPeakMemUsage: 0
      - PrepareTime: 0
      - RowsProduced: 1
      - TotalCpuTime: 712008680
      - TotalNetworkReceiveTime: 15000184
      - TotalNetworkSendTime: 0
      - TotalStorageWaitTime: 0
      - TotalTime: 16000196
      BlockMgr
        - BlockWritesOutstanding: 0
        - BlocksCreated: 0
        - BlocksRecycled: 0
        - BufferedPins: 0
        - BytesWritten: 0
        - InactiveTotalTime: 0
        - MaxBlockSize: 8388608
        - MemoryLimit: 4294967296
        - PeakMemoryUsage: 0
        - TotalBufferWaitTime: 0
        - TotalEncryptionTime: 0
        - TotalIntegrityCheckTime: 0
        - TotalReadBlockTime: 0
        - TotalTime: 0
      EXCHANGE_NODE (id=1)
        - BytesReceived: 441
        - ConvertRowBatchTime: 0
        - DeserializeRowBatchTimer: 0
        - FirstBatchArrivalWaitTime: 15000184
        - InactiveTotalTime: 15000184
        - PeakMemoryUsage: 0
        - RowsReturned: 1
        - RowsReturnedRate: 66
        - SendersBlockedTimer: 0
        - SendersBlockedTotalTimer(*): 0
        - TotalTime: 15000184
    Averaged Fragment F00
      split sizes:  min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0
      completion times: min:17.000ms  max:17.000ms  mean: 17.000ms  stddev:0.000ns
      execution rates: min:7.63 MB/sec  max:7.63 MB/sec  mean:7.63 MB/sec  stddev:0.00 /sec
      num instances: 1
      - AverageThreadTokens: 0.0
      - BloomFilterBytes: 0
      - InactiveTotalTime: 0
      - PeakMemoryUsage: 1644840
      - PerHostPeakMemUsage: 1644840
      - PrepareTime: 0
      - RowsProduced: 1
      - TotalCpuTime: 22999884
      - TotalNetworkReceiveTime: 0
      - TotalNetworkSendTime: 0
      - TotalStorageWaitTime: 16999908
      - TotalTime: 20999892
      BlockMgr
        - BlockWritesOutstanding: 0
        - BlocksCreated: 0
        - BlocksRecycled: 0
        - BufferedPins: 0
        - BytesWritten: 0
        - InactiveTotalTime: 0
        - MaxBlockSize: 8388608
        - MemoryLimit: 4294967296
        - PeakMemoryUsage: 0
        - TotalBufferWaitTime: 0
        - TotalEncryptionTime: 0
        - TotalIntegrityCheckTime: 0
        - TotalReadBlockTime: 0
        - TotalTime: 0
      DataStreamSender (dst_id=1)
        - BytesSent: 441
        - InactiveTotalTime: 0
        - NetworkThroughput(*): 0
        - OverallThroughput: 441001
        - PeakMemoryUsage: 296
        - RowsReturned: 1
        - SerializeBatchTime: 999996
        - TotalTime: 999996
        - TransmitDataRPCTime: 0
        - UncompressedRowBatchSize: 612
      HDFS_SCAN_NODE (id=0)
        - AverageHdfsReadThreadConcurrency: 0.0
        - AverageScannerThreadConcurrency: 0.0
        - BytesRead: 199843
        - BytesReadDataNodeCache: 0
        - BytesReadLocal: 199843
        - BytesReadRemoteUnexpected: 0
        - BytesReadShortCircuit: 199843
        - DecompressionTime: 0
        - InactiveTotalTime: 0
        - MaxCompressedTextFileLength: 0
        - NumColumns: 48
        - NumDisksAccessed: 1
        - NumRowGroups: 1
        - NumScannerThreadsStarted: 1
        - PeakMemoryUsage: 1636352
        - PerReadThreadRawHdfsThroughput: 0
        - RemoteScanRanges: 0
        - RowsRead: 760
        - RowsReturned: 1
        - RowsReturnedRate: 55
        - ScanRangesComplete: 1
        - ScannerThreadsInvoluntaryContextSwitches: 0
        - ScannerThreadsTotalWallClockTime: 18999900
          - MaterializeTupleTime(*): 0
          - ScannerThreadsSysTime: 0
          - ScannerThreadsUserTime: 2624000
        - ScannerThreadsVoluntaryContextSwitches: 30
        - TotalRawHdfsReadTime(*): 0
        - TotalReadThroughput: 0
        - TotalTime: 17999904
    Fragment F00
      Instance 3e40a60f2dfc2554:bac26d70201f8d9e (host=server8.example.com:22000)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB 
        - AverageThreadTokens: 0.0
        - BloomFilterBytes: 0
        - InactiveTotalTime: 0
        - PeakMemoryUsage: 1644840
        - PerHostPeakMemUsage: 1644840
        - PrepareTime: 0
        - RowsProduced: 1
        - TotalCpuTime: 22999884
        - TotalNetworkReceiveTime: 0
        - TotalNetworkSendTime: 0
        - TotalStorageWaitTime: 16999908
        - TotalTime: 20999892
        BlockMgr
          - BlockWritesOutstanding: 0
          - BlocksCreated: 0
          - BlocksRecycled: 0
          - BufferedPins: 0
          - BytesWritten: 0
          - InactiveTotalTime: 0
          - MaxBlockSize: 8388608
          - MemoryLimit: 4294967296
          - PeakMemoryUsage: 0
          - TotalBufferWaitTime: 0
          - TotalEncryptionTime: 0
          - TotalIntegrityCheckTime: 0
          - TotalReadBlockTime: 0
          - TotalTime: 0
        DataStreamSender (dst_id=1)
          - BytesSent: 441
          - InactiveTotalTime: 0
          - NetworkThroughput(*): 0
          - OverallThroughput: 441001
          - PeakMemoryUsage: 296
          - RowsReturned: 1
          - SerializeBatchTime: 999996
          - TotalTime: 999996
          - TransmitDataRPCTime: 0
          - UncompressedRowBatchSize: 612
        HDFS_SCAN_NODE (id=0)
          ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1
          Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB 
          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 
          File Formats: PARQUET/SNAPPY:48 
          - AverageHdfsReadThreadConcurrency: 0.0
          - AverageScannerThreadConcurrency: 0.0
          - BytesRead: 199843
          - BytesReadDataNodeCache: 0
          - BytesReadLocal: 199843
          - BytesReadRemoteUnexpected: 0
          - BytesReadShortCircuit: 199843
          - DecompressionTime: 0
          - InactiveTotalTime: 0
          - MaxCompressedTextFileLength: 0
          - NumColumns: 48
          - NumDisksAccessed: 1
          - NumRowGroups: 1
          - NumScannerThreadsStarted: 1
          - PeakMemoryUsage: 1636352
          - PerReadThreadRawHdfsThroughput: 0
          - RemoteScanRanges: 0
          - RowsRead: 760
          - RowsReturned: 1
          - RowsReturnedRate: 55
          - ScanRangesComplete: 1
          - ScannerThreadsInvoluntaryContextSwitches: 0
          - ScannerThreadsTotalWallClockTime: 18999900
            - MaterializeTupleTime(*): 0
            - ScannerThreadsSysTime: 0
            - ScannerThreadsUserTime: 2624000
          - ScannerThreadsVoluntaryContextSwitches: 30
          - TotalRawHdfsReadTime(*): 0
          - TotalReadThroughput: 0
          - TotalTime: 17999904

Impala-shell:

Query (id=494fccf3229a725c:40217c9c3c50929a)
  Summary
    Session ID: 454fbcd23796af47:82ab459a4077e5ab
    Session Type: BEESWAX
    Start Time: 2017-03-03 11:40:15.660573000
    End Time: 2017-03-03 11:40:15.806061000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.6.0-cdh5.8.2 RELEASE (build f25aa5b2bcdabf1eb4233747a7b04a067059ee3b)
    User: impala_user
    Connected User: impala_user
    Delegated User: 
    Network Address: ::ffff:4.3.2.1:55960
    Default Db: default
    Sql Statement: <QUERY>
    Coordinator: server1.example.com:22000
    Query Options (non default): 
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=768.00MB VCores=1

01:EXCHANGE [UNPARTITIONED]
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=0 row-size=614B cardinality=1
|
00:SCAN HDFS [default.impala_table, RANDOM]
   partitions=1/13999 files=1 size=132.88KB
   predicates: (default.impala_table.acct_id = 660)
   table stats: 760514828 rows total
   column stats: all
   hosts=1 per-host-mem=768.00MB
   tuple-ids=0 row-size=614B cardinality=1
----------------
    Estimated Per-Host Mem: 805306368
    Estimated Per-Host VCores: 1
    Request Pool: root.impala_user
    Admission result: Admitted immediately
    ExecSummary: 
Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  23.999ms  23.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 
    Planner Timeline
      Analysis finished: 9689649
      Equivalence classes computed: 15036979
      Single node plan created: 19349122
      Runtime filters computed: 19426476
      Distributed plan created: 21875782
      Lineage info computed: 24636383
      Planning finished: 33188255
    Query Timeline
      Start execution: 0
      Planning finished: 36000448
      Submit for admission: 37000464
      Completed admission: 37000464
      Ready to start 1 remote fragments: 38000476
      All 1 remote fragments started: 40000500
      Rows available: 64000800
      First row fetched: 142001772
      Unregister query: 145001808
  ImpalaServer
    - ClientFetchWaitTimer: 81001008
    - InactiveTotalTime: 0
    - RowMaterializationTimer: 0
    - TotalTime: 0
  Execution Profile 494fccf3229a725c:40217c9c3c50929a
    Number of filters: 0
    Filter routing table: 
 ID  Src. Node  Tgt. Node(s)  Targets  Target type  Partition filter  Pending (Expected)  First arrived   Completed
-------------------------------------------------------------------------------------------------------------------

    Fragment start latencies: Count: 1, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms
    Per Node Peak Memory Usage: server8.example.com:22000(1.57 MB) server1.example.com:22000(0) 
    - FiltersReceived: 0
    - FinalizationTimer: 0
    - InactiveTotalTime: 0
    - TotalTime: 27000336
    Coordinator Fragment F01
      - AverageThreadTokens: 0.0
      - BloomFilterBytes: 0
      - InactiveTotalTime: 0
      - PeakMemoryUsage: 12296
      - PerHostPeakMemUsage: 0
      - PrepareTime: 0
      - RowsProduced: 1
      - TotalCpuTime: 80000996
      - TotalNetworkReceiveTime: 24000300
      - TotalNetworkSendTime: 0
      - TotalStorageWaitTime: 0
      - TotalTime: 24000300
      BlockMgr
        - BlockWritesOutstanding: 0
        - BlocksCreated: 0
        - BlocksRecycled: 0
        - BufferedPins: 0
        - BytesWritten: 0
        - InactiveTotalTime: 0
        - MaxBlockSize: 8388608
        - MemoryLimit: 4294967296
        - PeakMemoryUsage: 0
        - TotalBufferWaitTime: 0
        - TotalEncryptionTime: 0
        - TotalIntegrityCheckTime: 0
        - TotalReadBlockTime: 0
        - TotalTime: 0
      EXCHANGE_NODE (id=1)
        - BytesReceived: 441
        - ConvertRowBatchTime: 0
        - DeserializeRowBatchTimer: 0
        - FirstBatchArrivalWaitTime: 24000300
        - InactiveTotalTime: 24000300
        - PeakMemoryUsage: 0
        - RowsReturned: 1
        - RowsReturnedRate: 41
        - SendersBlockedTimer: 0
        - SendersBlockedTotalTimer(*): 0
        - TotalTime: 24000300
    Averaged Fragment F00
      split sizes:  min: 132.88 KB, max: 132.88 KB, avg: 132.88 KB, stddev: 0
      completion times: min:26.000ms  max:26.000ms  mean: 26.000ms  stddev:0.000ns
      execution rates: min:4.99 MB/sec  max:4.99 MB/sec  mean:4.99 MB/sec  stddev:0.00 /sec
      num instances: 1
      - AverageThreadTokens: 0.0
      - BloomFilterBytes: 0
      - InactiveTotalTime: 0
      - PeakMemoryUsage: 1644840
      - PerHostPeakMemUsage: 1644840
      - PrepareTime: 0
      - RowsProduced: 1
      - TotalCpuTime: 25999864
      - TotalNetworkReceiveTime: 0
      - TotalNetworkSendTime: 0
      - TotalStorageWaitTime: 22999880
      - TotalTime: 25999864
      BlockMgr
        - BlockWritesOutstanding: 0
        - BlocksCreated: 0
        - BlocksRecycled: 0
        - BufferedPins: 0
        - BytesWritten: 0
        - InactiveTotalTime: 0
        - MaxBlockSize: 8388608
        - MemoryLimit: 4294967296
        - PeakMemoryUsage: 0
        - TotalBufferWaitTime: 0
        - TotalEncryptionTime: 0
        - TotalIntegrityCheckTime: 0
        - TotalReadBlockTime: 0
        - TotalTime: 0
      DataStreamSender (dst_id=1)
        - BytesSent: 441
        - InactiveTotalTime: 0
        - NetworkThroughput(*): 0
        - OverallThroughput: 0
        - PeakMemoryUsage: 296
        - RowsReturned: 1
        - SerializeBatchTime: 0
        - TotalTime: 0
        - TransmitDataRPCTime: 0
        - UncompressedRowBatchSize: 612
      HDFS_SCAN_NODE (id=0)
        - AverageHdfsReadThreadConcurrency: 0.0
        - AverageScannerThreadConcurrency: 0.0
        - BytesRead: 199843
        - BytesReadDataNodeCache: 0
        - BytesReadLocal: 199843
        - BytesReadRemoteUnexpected: 0
        - BytesReadShortCircuit: 199843
        - DecompressionTime: 0
        - InactiveTotalTime: 0
        - MaxCompressedTextFileLength: 0
        - NumColumns: 48
        - NumDisksAccessed: 1
        - NumRowGroups: 1
        - NumScannerThreadsStarted: 1
        - PeakMemoryUsage: 1636352
        - PerReadThreadRawHdfsThroughput: 0
        - RemoteScanRanges: 0
        - RowsRead: 760
        - RowsReturned: 1
        - RowsReturnedRate: 41
        - ScanRangesComplete: 1
        - ScannerThreadsInvoluntaryContextSwitches: 0
        - ScannerThreadsTotalWallClockTime: 22999880
          - MaterializeTupleTime(*): 0
          - ScannerThreadsSysTime: 2775000
          - ScannerThreadsUserTime: 0
        - ScannerThreadsVoluntaryContextSwitches: 35
        - TotalRawHdfsReadTime(*): 0
        - TotalReadThroughput: 0
        - TotalTime: 23999876
    Fragment F00
      Instance 494fccf3229a725c:40217c9c3c50929c (host=server8.example.com:22000)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB 
        - AverageThreadTokens: 0.0
        - BloomFilterBytes: 0
        - InactiveTotalTime: 0
        - PeakMemoryUsage: 1644840
        - PerHostPeakMemUsage: 1644840
        - PrepareTime: 0
        - RowsProduced: 1
        - TotalCpuTime: 25999864
        - TotalNetworkReceiveTime: 0
        - TotalNetworkSendTime: 0
        - TotalStorageWaitTime: 22999880
        - TotalTime: 25999864
        BlockMgr
          - BlockWritesOutstanding: 0
          - BlocksCreated: 0
          - BlocksRecycled: 0
          - BufferedPins: 0
          - BytesWritten: 0
          - InactiveTotalTime: 0
          - MaxBlockSize: 8388608
          - MemoryLimit: 4294967296
          - PeakMemoryUsage: 0
          - TotalBufferWaitTime: 0
          - TotalEncryptionTime: 0
          - TotalIntegrityCheckTime: 0
          - TotalReadBlockTime: 0
          - TotalTime: 0
        DataStreamSender (dst_id=1)
          - BytesSent: 441
          - InactiveTotalTime: 0
          - NetworkThroughput(*): 0
          - OverallThroughput: 0
          - PeakMemoryUsage: 296
          - RowsReturned: 1
          - SerializeBatchTime: 0
          - TotalTime: 0
          - TransmitDataRPCTime: 0
          - UncompressedRowBatchSize: 612
        HDFS_SCAN_NODE (id=0)
          ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 0 out of 1
          Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/132.88 KB 
          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 
          File Formats: PARQUET/SNAPPY:48 
          - AverageHdfsReadThreadConcurrency: 0.0
          - AverageScannerThreadConcurrency: 0.0
          - BytesRead: 199843
          - BytesReadDataNodeCache: 0
          - BytesReadLocal: 199843
          - BytesReadRemoteUnexpected: 0
          - BytesReadShortCircuit: 199843
          - DecompressionTime: 0
          - InactiveTotalTime: 0
          - MaxCompressedTextFileLength: 0
          - NumColumns: 48
          - NumDisksAccessed: 1
          - NumRowGroups: 1
          - NumScannerThreadsStarted: 1
          - PeakMemoryUsage: 1636352
          - PerReadThreadRawHdfsThroughput: 0
          - RemoteScanRanges: 0
          - RowsRead: 760
          - RowsReturned: 1
          - RowsReturnedRate: 41
          - ScanRangesComplete: 1
          - ScannerThreadsInvoluntaryContextSwitches: 0
          - ScannerThreadsTotalWallClockTime: 22999880
            - MaterializeTupleTime(*): 0
            - ScannerThreadsSysTime: 2775000
            - ScannerThreadsUserTime: 0
          - ScannerThreadsVoluntaryContextSwitches: 35
          - TotalRawHdfsReadTime(*): 0
          - TotalReadThroughput: 0
          - TotalTime: 23999876

 

avatar
Super Guru

I can see that from JDBC:

Start Time: 2017-03-03 11:38:07.269796000
    End Time: 2017-03-03 11:38:08.508511000

Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  17.999ms  17.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 
 

 

and from impala-shell

    Start Time: 2017-03-03 11:40:15.660573000
    End Time: 2017-03-03 11:40:15.806061000
Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------
01:EXCHANGE         1   0.000ns   0.000ns      1           1         0        -1.00 B  UNPARTITIONED                  
00:SCAN HDFS        1  23.999ms  23.999ms      1           1   1.56 MB      768.00 MB  default.impala_table 

 

I don't see major differences here, though from JDBC in total it took just above 1 second and through impala-shell in total well below 1 second.

 

What was the main difference in time from JDBC connection start to finish? Queries will definitely have some overhead when running through JDBC since it is from outside of cluster. You can find more details from the TRACE log you collected.

avatar
Rising Star

Hello,

 

I am having a similar problem.

 

We have some processing tasks to run on Impala, which require some custom functionalities. The processing consists of selecting some columns, process the value and insert the result in a new table (or column).

 

There are 2 options that we are looking at: UDF (Java) to be used natively in Impala, and JDBC connection from Java to Impala to select, process and insert record sets from a Java application. The latter option was introduced to reduce the complexity of the Impala queries but also to evaluate its performance.

 

Some initial experimentation on 1M rows showed that UDFs are dramatically faster (~10x) than the corresponding JDBC version (which however is still single-threaded).

 

The test environment features 2 worker nodes (2 Impala daemons), but the JDBC version uses only one coordinator.

 

Is there any value to spend time to investigate the performance of the JDBC version? Will it ever match the UDF one?

 

Thank you,

Gerasimos

avatar
Master Collaborator

avatar
Super Guru

If the PROFILES showed the same performance, then you can turn on the debug logs for JDBC to see which part it was slow, add the following to the JDBC connection string:

 

;LogLevel=6;LogPath=/path/to/directory

 

Then examine the debug log create under the directory after you run the query.

avatar
Explorer

Thanks EricL.

Plenty to dig into with these logs, quick wins so far:

  • select a, b, c from database.view is slow as it runs a describe statement prior to SQL execution.
  • select a, b, c from table is slow as it runs a describe statement prior to SQL execution, despite having the database name in my connection URL.
  • select a, b, c from database.table is faster as no additional describe statement is executed, but still a good bit slower than simply executed it in Impala-Shell. 

I'm going to dig into the additional JDBC overhead.

 

EDIT:

A show tables statement is executed everytime prior to the SQL execution, adding a 1 sec delay.