Reply
New Contributor
Posts: 7
Registered: ‎05-03-2017

batch insert into kudu table via using impala jdbc, take long time to plan using large batch size

batch insert into kudu table via using impala jdbc, take long time to plan using large batch size

it seems using more large batch size, the performance is more lower, and time is spend almost in planning stage.

using 500 batch size planning time [1m16s (1m16s)] is more than 5 times when using 100 batch size [4s074ms (4s074ms)]

when using batch size is 500, profile show - Planning finished: 1m16s (1m16s)

INSERT INTO KUDU [default.kudu_table_236w_test]
|  hosts=1 per-host-mem=unavailable
|
00:UNION
   constant-operands=500
   hosts=1 per-host-mem=unavailable
   tuple-ids=0 row-size=646B cardinality=500
----------------
    Estimated Per-Host Mem: 1024
    Estimated Per-Host VCores: 1
    Request Pool: root.hadoop
    Admission result: Admitted immediately
    ExecSummary:
Operator   #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------
00:UNION        1  17.163ms  17.163ms    500         500   3.82 MB        -1.00 B
    Planner Timeline: 108.920ms
       - Analysis finished: 84.299ms (84.299ms)
       - Equivalence classes computed: 84.595ms (295.967us)
       - Single node plan created: 95.259ms (10.664ms)
       - Runtime filters computed: 95.263ms (3.330us)
       - Distributed plan created: 95.303ms (40.585us)
       - Planning finished: 108.920ms (13.617ms)
    Query Timeline: 1m16s
       - Query submitted: 172.482us (172.482us)
       - Planning finished: 1m16s (1m16s)
       - Submit for admission: 1m16s (37.273ms)
       - Completed admission: 1m16s (35.699us)
       - Ready to start 1 fragment instances: 1m16s (301.262us)
       - All 1 fragment instances started: 1m16s (157.390ms)
       - DML data written: 1m16s (122.260ms)
       - DML Metastore update finished: 1m16s (18.850ms)
       - Request finished: 1m16s (21.189ms)
       - Unregister query: 1m16s (1.956ms)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 1.912ms
     - MetastoreUpdateTimer: 40.034ms
     - RowMaterializationTimer: 0.000ns
  Execution Profile 9d47daa6e290e1c4:3b19583800000000:(Total: 261.448ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table:
    
    
we decrease batch size to 100, the performance increase , profile show - Planning finished: 4s074ms (4s074ms)

INSERT INTO KUDU [default.kudu_table_236w_test]
|  hosts=1 per-host-mem=unavailable
|
00:UNION
   constant-operands=100
   hosts=1 per-host-mem=unavailable
   tuple-ids=0 row-size=583B cardinality=100
----------------
    Estimated Per-Host Mem: 1024
    Estimated Per-Host VCores: 1
    Request Pool: root.hadoop
    Admission result: Admitted immediately
    ExecSummary:
Operator   #Hosts  Avg Time  Max Time  #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
-------------------------------------------------------------------------------------------
00:UNION        1   4.040ms   4.040ms    100         100  856.00 KB        -1.00 B
    Planner Timeline: 20.328ms
       - Analysis finished: 15.412ms (15.412ms)
       - Equivalence classes computed: 15.688ms (275.797us)
       - Single node plan created: 17.325ms (1.636ms)
       - Runtime filters computed: 17.328ms (3.313us)
       - Distributed plan created: 17.366ms (38.581us)
       - Planning finished: 20.328ms (2.961ms)
    Query Timeline: 4s173ms
       - Query submitted: 131.432us (131.432us)
       - Planning finished: 4s074ms (4s074ms)
       - Submit for admission: 4s080ms (5.772ms)
       - Completed admission: 4s080ms (18.240us)
       - Ready to start 1 fragment instances: 4s080ms (242.580us)
       - All 1 fragment instances started: 4s114ms (34.550ms)
       - DML data written: 4s159ms (44.196ms)
       - DML Metastore update finished: 4s163ms (3.880ms)
       - Request finished: 4s166ms (3.411ms)
       - Unregister query: 4s168ms (1.628ms)
     - ComputeScanRangeAssignmentTimer: 0.000ns
  ImpalaServer:
     - ClientFetchWaitTimer: 1.582ms
     - MetastoreUpdateTimer: 7.286ms
     - RowMaterializationTimer: 0.000ns
  Execution Profile 3144a94c034dca1e:5f6adb3a00000000:(Total: 74.739ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table:

Cloudera Employee J-D
Cloudera Employee
Posts: 55
Registered: ‎07-08-2013

Re: batch insert into kudu table via using impala jdbc, take long time to plan using large batch siz

Hi,

 

This is likely to be slow, as described in this section of the documentation: http://kudu.apache.org/docs/kudu_impala_integration.html#kudu_impala_insert_bulk

 

There isn't much you can do about this, except going the "Batch Insert" route as described in the link above.

 

J-D

Cloudera Employee
Posts: 16
Registered: ‎12-19-2013

Re: batch insert into kudu table via using impala jdbc, take long time to plan using large batch siz

Hi,

 

I wasn't able to reproduce this behavior using the latest version of Impala and Kudu. I was able to insert 500 rows in 360ms and 1000 rows in 725ms -- all via JDBC and a big INSERT ... VALUES statement.

 

Can you make sure you're using the latest Impala and JDBC driver? Can you also share the schema of your table?

New Contributor
Posts: 7
Registered: ‎05-03-2017

Re: batch insert into kudu table via using impala jdbc, take long time to plan using large batch siz

we test using 110 columns. how many columns you do the test.

New Contributor
Posts: 7
Registered: ‎05-03-2017

Re: batch insert into kudu table via using impala jdbc, take long time to plan using large batch siz

hello Matt Jacobs

 

we using CDH 5.10.1 and jdbc version is impala_jdbc_2.5.37.1057

 

table schema is

CREATE TABLE kudu_table_236w_test ( `rowkey` string ,  `CIRCUIT_ID` string ,  `A_PORT_ID` string ,  `Z_PORT_ID` string ,  `A_EQP_ID` string ,  `Z_EQP_ID` string ,  `A_EQP_RES_TYPE_ID` string ,  `Z_EQP_RES_TYPE_ID` string ,  `AREA_ID` string ,  `A_USER_ROOM_ID` string ,  `Z_USER_ROOM_ID` string ,  `OLD_ID_EQP` string ,  `OLD_SP` string ,  `VERSION` string ,  `PROTECTIONTYPE` string ,  `PROTECTIONSTATUS` string ,  `NOTES` string ,  `CREATE_OP` string ,  `CIRCUIT_NO` string ,  `CIRCUITTYPE_ID` string ,  `CIRCUIT_NAME` string ,  `CIRCUIT_RATE` string ,  `DIRECTION` string ,  `WORKTYPE` string ,  `FUNCTIONSTATUS_ID` string ,  `GROUPCODE` string ,  `GROUPNAME` string ,  `FIBER_AVAILABLE` string ,  `MODIFY_OP` string ,  `CREATE_DATE` string ,  `MODIFY_TIME` string ,  `ROUTESTATUS` string ,  `ASSEMBLENAME` string ,  `ASSEMBLECODE` string ,  `A_D_CARD_ID` string ,  `A_M_CARD_ID` string ,  `A_D_E_CARD_ID` string ,  `A_M_E_CARD_ID` string ,  `PARENTCIRCUIT_ID` string ,  `CIRCUITDEF_ID` string ,  `OPR_STATE_ID` string ,  `Z_D_CARD_ID` string ,  `Z_M_CARD_ID` string ,  `Z_D_E_CARD_ID` string ,  `Z_M_E_CARD_ID` string ,  `SUBTYPE` string ,  `FIBER_NUM` string ,  `APORT_P_ID` string ,  `CONNECTPOSITION` string ,  `FROZENSTATUS` string ,  `ISCREATEINCONFIG` string ,  `ISHISTORY` string ,  `LIFECYCLE` string ,  `OWNER_ID` string ,  `PHYSICALSTATUS` string ,  `PROVISIONSTATUS_ID` string ,  `RELEASETIME` string ,  `RESERVELEVEL` string ,  `STATUS` string ,  `ZPORT_P_ID` string ,  `Z_NODE_ID` string ,  `Z_PORT_NO` string ,  `END_REGION_ID` string ,  `CAR` string ,  `LPT` string ,  `MSTPCIRCUITTYPE` string ,  `QOS` string ,  `SLC` string ,  `VPNNO` string ,  `MEDIA_ID` string ,  `ATTENTYPE` string ,  `ISSINGLEFIBER` string ,  `THIRD_PARTY_OPERATORS` string ,  `USERLABEL` string ,  `CIRCUIT_LEVEL` string ,  `PURPOSE` string ,  `SERVICECLASS` string ,  `Z_USER_PORT_DUAL_MODE` string ,  `A_EQUIP_MAINTAIN_DEPT` string ,  `A_EQUIP_MAINTAIN_TEL` string ,  `A_EQUIP_MAINTAIN_CONTACT` string ,  `A_MAINTAININFO` string ,  `A_INTERFACE_TYPE` string ,  `CUST_CIRCUIT_GRADE` string ,  `LINKTYPE` string ,  `FLAG` string ,  `SLTYPE` string ,  `ISCHARGE` string ,  `NET_LEVEL` string ,  `CIRCUIT_GRADE` string ,  `Z_EQUIP_MAINTAIN_DEPT` string ,  `Z_EQUIP_MAINTAIN_TEL` string ,  `Z_EQUIP_MAINTAIN_CONTACT` string ,  `Z_INTERFACE_TYPE` string ,  `EIPNO` string ,  `ISRELAY` string ,  `MAINORMINOR` string ,  `TOTALACTUALATTEN` double ,  `TOTALACTUALLENGTH` double ,  `A_MAINTAIN_POINT` string ,  `A_TRANSFORMER` string ,  `A_USER_PORT_DUAL_MODE` string ,  `A_WQ_LENGTH` string ,  `CONFIGMODE` string ,  `LOID` string ,  `PARTENER` string ,  `SERVICE_GRADE` string ,  `Z_MAINTAIN_POINT` string ,  `Z_MAINTAININFO` string ,  `Z_TRANSFORMER` string ,  `CUSTOMERDEMANDLATENCYEND` string ,  `CUSTOMERDEMANDLATENCYPOP` string ,  `ISLOWTIMELAGGING` string ,  `MSROUPROSWITCHREPLYWAY` string ,  `TIMEDELAYTYPE` string ,  `Z_WQ_LENGTH` string,  PRIMARY KEY(`rowkey`)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU

 

this table is more than 100 colums;

Announcements