Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using Impala prepared statement - can plan portion of query be further reduced?

Using Impala prepared statement - can plan portion of query be further reduced?

New Contributor

I'm trying to get low latency queries in the range of single digit milliseconds for selects of single rows using the rowkey with Impala on Hbase. I'm using the Impala 2.5.32 JDBC driver with prepared statements. Currently, these queries are running in about 33 ms. Using the prepared statement, I didn't expect to see much time spent in the plan portion of the query. However, I'm seeing a planning time of around 15 ms. See profile of a sample query below. Aside from the prepared statement, is there some way that I can further reduce the amount of time in the plan portion of the query?

 

Query (id=4a4bc939f44c2fe0:44cc81eb3c7676b8):

  Summary:

    Session ID: b041d5b55eb4c34b:c51dd27420093cac

    Session Type: HIVESERVER2

    HiveServer2 Protocol Version: V6

    Start Time: 2016-09-09 12:02:34.016116000

    End Time: 2016-09-09 12:02:34.044441000

    Query Type: QUERY

    Query State: FINISHED

    Query Status: OK

    Impala Version: impalad version 2.5.0-cdh5.7.1 RELEASE (build 27a4325c18c2a01c7a8097681a0eccf6d4335ea1)

    User:

    Connected User:

    Delegated User:

    Network Address: (removed)

    Default Db: default

    Sql Statement: SELECT `hbase_vm_customer`.`cas_id`, `hbase_vm_customer`.`acct_org_code`, `hbase_vm_customer`.`acct_org_desc`, `hbase_vm_customer`.`acct_region_code`, `hbase_vm_customer`.`acct_region_desc`, `hbase_vm_customer`.`acct_rep_code`, `hbase_vm_customer`.`acct_rep_name`, `hbase_vm_customer`.`acct_sid`, `hbase_vm_customer`.`acct_team_code`, `hbase_vm_customer`.`acct_team_desc`, `hbase_vm_customer`.`acct_terr_code`, `hbase_vm_customer`.`acct_terr_desc`, `hbase_vm_customer`.`addr_line1`, `hbase_vm_customer`.`addr_line2`, `hbase_vm_customer`.`addr_line3`, `hbase_vm_customer`.`affiliate_code`, `hbase_vm_customer`.`affiliate_desc`, `hbase_vm_customer`.`bac_code`, `hbase_vm_customer`.`bac_name`, `hbase_vm_customer`.`bac_orgsrt_code`, `hbase_vm_customer`.`bac_orgsrt_desc`, `hbase_vm_customer`.`bac_summorg_code`, `hbase_vm_customer`.`city`, `hbase_vm_customer`.`country_code`, `hbase_vm_customer`.`country_name`, `hbase_vm_customer`.`country_sub_region`, `hbase_vm_customer`.`cust_class`, `hbase_vm_customer`.`eci_id`, `hbase_vm_customer`.`finl_ent`, `hbase_vm_customer`.`ges_desc`, `hbase_vm_customer`.`ges_ucn_id`, `hbase_vm_customer`.`ib_class`, `hbase_vm_customer`.`indust_code`, `hbase_vm_customer`.`indust_desc`, `hbase_vm_customer`.`indust_seg_code`, `hbase_vm_customer`.`indust_seg_desc`, `hbase_vm_customer`.`int_ext_ind`, `hbase_vm_customer`.`legal_name`, `hbase_vm_customer`.`line_of_business`, `hbase_vm_customer`.`line_of_business_desc`, `hbase_vm_customer`.`lob_owner_code`, `hbase_vm_customer`.`lob_owner_desc`, `hbase_vm_customer`.`long_name`, `hbase_vm_customer`.`maj_indust_code`, `hbase_vm_customer`.`maj_indust_desc`, `hbase_vm_customer`.`market_code`, `hbase_vm_customer`.`market_desc`, `hbase_vm_customer`.`naic_code`, `hbase_vm_customer`.`naic_desc`, `hbase_vm_customer`.`officer_name`, `hbase_vm_customer`.`officer_sid`, `hbase_vm_customer`.`org_code`, `hbase_vm_customer`.`org_desc`, `hbase_vm_customer`.`region_code`, `hbase_vm_customer`.`region_desc`, `hbase_vm_customer`.`rep_code`, `hbase_vm_customer`.`rep_name`, `hbase_vm_customer`.`rep_region_code`, `hbase_vm_customer`.`rep_region_desc`, `hbase_vm_customer`.`risk_rate`, `hbase_vm_customer`.`sales_sid`, `hbase_vm_customer`.`seg_code`, `hbase_vm_customer`.`seg_desc`, `hbase_vm_customer`.`service_org_code`, `hbase_vm_customer`.`service_org_desc`, `hbase_vm_customer`.`service_region_code`, `hbase_vm_customer`.`service_region_desc`, `hbase_vm_customer`.`service_rep_code`, `hbase_vm_customer`.`service_rep_name`, `hbase_vm_customer`.`service_sid`, `hbase_vm_customer`.`service_team_code`, `hbase_vm_customer`.`service_team_desc`, `hbase_vm_customer`.`service_terr_code`, `hbase_vm_customer`.`service_terr_desc`, `hbase_vm_customer`.`short_name`, `hbase_vm_customer`.`state`, `hbase_vm_customer`.`subm_code`, `hbase_vm_customer`.`subm_desc`, `hbase_vm_customer`.`tax_id`, `hbase_vm_customer`.`team_code`, `hbase_vm_customer`.`team_desc`, `hbase_vm_customer`.`terr_code`, `hbase_vm_customer`.`terr_desc`, `hbase_vm_customer`.`tier_code`, `hbase_vm_customer`.`ult_parent`, `hbase_vm_customer`.`ult_parent_eci_id`, `hbase_vm_customer`.`zip` FROM `default`.`hbase_vm_customer` WHERE (`hbase_vm_customer`.`cas_id` = '2730960950')

    Coordinator: (removed):22000

    Query Options (non default):

    Plan:

----------------

Estimated Per-Host Requirements: Memory=0B VCores=0

 

00:SCAN HBASE [default.hbase_vm_customer]

   start key: 2730960950

   stop key: 2730960950\0

   table stats: 8632631 rows total

   column stats: all

   hosts=100 per-host-mem=unavailable

   tuple-ids=0 row-size=2.13KB cardinality=1

----------------

    Estimated Per-Host Mem: 0

    Estimated Per-Host VCores: 0

    Request Pool: default-pool

    ExecSummary:

Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail

-----------------------------------------------------------------------------------------------

00:SCAN HBASE       1   5.738ms   5.738ms      1           1   4.00 KB        -1.00 B        

    Planner Timeline: 13.891ms

       - Analysis finished: 3.977ms (3.977ms)

       - Equivalence classes computed: 4.608ms (631.247us)

       - Single node plan created: 10.173ms (5.565ms)

       - Distributed plan created: 11.974ms (1.800ms)

       - Planning finished: 13.891ms (1.916ms)

    Query Timeline: 29.507ms

       - Start execution: 64.407us (64.407us)

     - Planning finished: 15.706ms (15.641ms)

       - Rows available: 21.229ms (5.523ms)

       - First row fetched: 24.529ms (3.300ms)

       - Unregister query: 28.334ms (3.805ms)

  ImpalaServer:

     - ClientFetchWaitTimer: 4.449ms

     - RowMaterializationTimer: 48.481us

  Execution Profile 4a4bc939f44c2fe0:44cc81eb3c7676b8:(Total: 7.357ms, non-child: 0.000ns, % non-child: 0.00%)

    Number of filters: 0

    Filter routing table:

 ID  Src. Node  Tgt. Node  Targets  Type   Partition filter

-----------------------------------------------------------

 

     - FiltersReceived: 0 (0)

     - FinalizationTimer: 0.000ns

    Coordinator Fragment F00:(Total: 6.551ms, non-child: 813.151us, % non-child: 12.41%)

      Hdfs split stats (<volume id>:<# splits>/<split lengths>):

       - AverageThreadTokens: 0.00

       - BloomFilterBytes: 0

       - PeakMemoryUsage: 2.39 MB (2502656)

       - PerHostPeakMemUsage: 0

       - PrepareTime: 59.354us

       - RowsProduced: 1 (1)

       - TotalCpuTime: 10.754ms

       - TotalNetworkReceiveTime: 0.000ns

       - TotalNetworkSendTime: 0.000ns

       - TotalStorageWaitTime: 0.000ns

      BlockMgr:

         - BlockWritesOutstanding: 0 (0)

         - BlocksCreated: 0 (0)

         - BlocksRecycled: 0 (0)

         - BufferedPins: 0 (0)

         - BytesWritten: 0

         - MaxBlockSize: 8.00 MB (8388608)

         - MemoryLimit: 6871947776.00 GB (7378697739434983424)

         - PeakMemoryUsage: 0

         - TotalBufferWaitTime: 0.000ns

         - TotalEncryptionTime: 0.000ns

         - TotalIntegrityCheckTime: 0.000ns

         - TotalReadBlockTime: 0.000ns

      HBASE_SCAN_NODE (id=0):(Total: 5.738ms, non-child: 5.738ms, % non-child: 100.00%)

         - BytesRead: 1.46 KB (1494)

         - HBaseTableScanner.ScanSetup: 196.357us

         - PeakMemoryUsage: 4.00 KB (4096)

         - RowsRead: 1 (1)

         - RowsReturned: 1 (1)

         - RowsReturnedRate: 174.00 /sec

         - ScannerThreadsInvoluntaryContextSwitches: 4 (4)

         - ScannerThreadsTotalWallClockTime: 2.498ms

           - MaterializeTupleTime(*): 366.678us

           - ScannerThreadsSysTime: 0.000ns

           - ScannerThreadsUserTime: 1.000ms

         - ScannerThreadsVoluntaryContextSwitches: 2 (2)

         - TotalRawHBaseReadTime(*): 1.741ms

         - TotalReadThroughput: 0.00 /sec

 

 

 

4 REPLIES 4
Highlighted

Re: Using Impala prepared statement - can plan portion of query be further reduced?

Master Collaborator

I think you're already doing most of what you can do. We don't implement plan caching so it does replan the query each time, which I believe may involve communicating with HBase to find tablet locations.

Highlighted

Re: Using Impala prepared statement - can plan portion of query be further reduced?

New Contributor

Is there any intent to add plan caching?

I've made similar observations to Cloudera re slow statement prepartation times with Impala. In some cases where it could be several 10s of seconds to complete a PreparedStatement where that cost repeats for subsequent prepares. 


Highlighted

Re: Using Impala prepared statement - can plan portion of query be further reduced?

New Contributor

Thanks for the information. The plan caching would be a nice feature to have.

Highlighted

Re: Using Impala prepared statement - can plan portion of query be further reduced?

Master Collaborator

I agree, it's definitely on our radar but it isn't on a concrete roadmap yet.

 

 

We've been putting steady effort into improving query start-up time, e.g https://issues.cloudera.org/browse/IMPALA-1599, https://issues.cloudera.org/browse/IMPALA-2550, http... , so the general issue is something that is getting attention.

Don't have an account?
Coming from Hortonworks? Activate your account here