Hi,
I am using latest Cloudera VM as on 17June2019, impala version 2.10.0 . I observed INSERT INTO is taking too much time for table which consist of 31 cloumns. Its taking more time in plan waiting .
I am posting my query profile-
Query Info Query ID: b746ed90a692cef9:d7c51dd400000000 User: Database: default Coordinator: quickstart.cloudera Query Type: DML Query State: FINISHED Start Time: Jun 19, 2019 1:07:23 PM End Time: Jun 19, 2019 1:08:07 PM Duration: 43.8s Rows Produced: 1,000 Admission Result: Admitted immediately Admission Wait Time: 0ms Aggregate Peak Memory Usage: 389.0 KiB Client Fetch Wait Time: 25ms Client Fetch Wait Time Percentage: 0 Estimated per Node Peak Memory: 10.0 MiB File Formats: HDFS Bytes Written: 160.0 KiB Impala Version: impalad version 2.10.0-cdh5.13.0 RELEASE (build 2511805f1eaa991df1460276c7e9f19d819cd4e4) Memory Accrual: 199,168 byte seconds Network Address: 172.16.10.72:49084 Node with Peak Memory Usage: quickstart.cloudera:22000 Out of Memory: false Per Node Peak Memory Usage: 389.0 KiB Planning Wait Time: 42.45s Planning Wait Time Percentage: 97 Pool: root.default Query Status: OK Rows Inserted: 1,000 Session ID: 794f64ac9b4b1553:14542a16b4511ba2 Session Type: HIVESERVER2 Statistics Corrupt: false Statistics Missing: false Threads: CPU Time: 108ms Threads: CPU Time Percentage: 100 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 0ms Threads: Network Send Wait Time Percentage: 0 Threads: Storage Wait Time: 0ms Threads: Storage Wait Time Percentage: 0 Threads: Total Time: 108ms Query Timeline Query submitted: 86.45us (86.45us) Planning finished: 42.45s (42.45s) Submit for admission: 42.47s (21ms) Completed admission: 42.47s (135.69us) Ready to start on 1 backends: 42.48s (1ms) All 1 execution backends (1 fragment instances) started: 42.55s (74ms) DML data written: 43.37s (822ms) DML Metastore update finished: 43.77s (395ms) Request finished: 43.79s (20ms) Unregister query: 43.81s (25ms) Planner Timeline Analysis finished: 41.71s (41.71s) Equivalence classes computed: 41.72s (6ms) Single node plan created: 41.87s (155ms) Runtime filters computed: 41.87s (107.52us) Distributed plan created: 41.88s (8ms) Lineage info computed: 42.09s (214ms) Planning finished: 42.27s (179ms)
What could be the reason and how can I solve this ?
Thanks
Created 06-19-2019 10:09 AM
Do you have an example of the query? Does it have a lot of VALUES or a lot of columns?
Created 06-19-2019 11:04 PM
It has only 31 columns consisting datatype int, string and timestamp. I had created batch of mere 1k records using Prepared Statement in java, I am using latest jdbcdriver41.
Following query I have used to create table-
create TABLE IF NOT EXISTS tbl_mindarray (source_ip string,destination_ip string,protocol_number int,source_port int,destination_port int,packet bigint,volume bigint,duration bigint,pps bigint,bps bigint,bpp bigint,source_latitude string,source_longitude string,source_city string,source_country string,destination_latitude string,destination_longitude string,destination_city string,destination_country string,ingress_volume bigint,egress_volume bigint,ingress_packet bigint,egress_packet bigint,source_if_index int,destination_if_index int,source_host string,event_date timestamp,event_time timestamp,_time int,flow bigint) partitioned by (year int) stored as parquet;
Created 06-19-2019 11:10 PM
Created 06-19-2019 11:33 PM
Hi,
I am posting my entire java code here.
import com.cloudera.impala.jdbc41.internal.com.cloudera.altus.shaded.org.apache.commons.lang3.RandomStringUtils; import java.sql.*; import java.util.Random; class impala12 { private static final String CONNECTION_URL = "jdbc:impala://172.16.8.177:21050;SCRATCH_LIMIT=-1" ; private static final String sqlStatementCreate = "CREATE TABLE if not exists helloworld (message String) STORED AS PARQUET"; private static final String sqlStatementInsert = "INSERT INTO helloworld VALUES (\"helloworld\")"; private static final String sqlCompiledQuery = "INSERT INTO tbl_mindarray (source_ip,destination_ip,protocol_number," + "source_port,destination_port,packet,volume,duration,pps,bps,bpp,source_latitude,source_longitude," + "source_city,source_country,destination_latitude,destination_longitude ,destination_city ,destination_country ," + "ingress_volume ,egress_volume ,ingress_packet ,egress_packet ,source_if_index ,destination_if_index," + "source_host,event_date,event_time,_time,flow,year)" + " VALUES" + "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; ; public static void main(String[] args) { System.out.println("Impala using Java"); writeInABatchWithCompiledQuery(1000); System.out.println("Done"); } private static Connection connectViaDS() throws Exception { Connection connection = null; Class.forName("com.cloudera.impala.jdbc41.Driver"); connection = DriverManager.getConnection(CONNECTION_URL); return connection; } private static void writeInABatchWithCompiledQuery(int records) { long e_date= 1275822966, e_time= 1370517366; PreparedStatement preparedStatement; // int total = 1000000*1000; int total = 1000*1000; int counter =0; Connection connection = null; try { connection = connectViaDS(); connection.setAutoCommit(false); preparedStatement = connection.prepareStatement(sqlCompiledQuery); Timestamp ed = new Timestamp(e_date); Timestamp et = new Timestamp(e_time); while(counter <total) { for (int index = 1; index <= records; index++) { counter++; preparedStatement.setString(1, ranStr()); // System.out.println(1); preparedStatement.setString(2, ranStr()); // System.out.println(2); preparedStatement.setInt(3, ranInt(1,10)); // System.out.println(3); preparedStatement.setInt(4, ranInt(1,10)); // System.out.println(4); preparedStatement.setInt(5, ranInt(1,10)); // System.out.println(5); preparedStatement.setInt(6, ranInt(1,10)); // System.out.println(6); preparedStatement.setInt(7, ranInt(1,10)); // System.out.println(7); preparedStatement.setInt(8, ranInt(1,10)); // System.out.println(8); preparedStatement.setInt(9, ranInt(1,10)); // System.out.println(9); preparedStatement.setInt(10, ranInt(1,10)); // System.out.println(10); preparedStatement.setInt(11, ranInt(1,10)); // System.out.println(11); preparedStatement.setString(12, ranStr()); // System.out.println(12); preparedStatement.setString(13, ranStr()); // System.out.println(13); preparedStatement.setString(14, ranStr()); // System.out.println(14); preparedStatement.setString(15, ranStr()); // System.out.println(15); preparedStatement.setString(16, ranStr()); // System.out.println(16); preparedStatement.setString(17, ranStr()); // System.out.println(17); preparedStatement.setString(18, ranStr()); // System.out.println(18); preparedStatement.setString(19, ranStr()); // System.out.println(19); preparedStatement.setInt(20, ranInt(1,10)); // System.out.println(20); preparedStatement.setInt(21, ranInt(1,10)); // System.out.println(21); preparedStatement.setInt(22, ranInt(1,10)); // System.out.println(22); preparedStatement.setInt(23,ranInt(1,10)); // System.out.println(23); preparedStatement.setInt(24, ranInt(1,10)); // System.out.println(24); preparedStatement.setInt(25, ranInt(1,10)); // System.out.println(25); preparedStatement.setString(26, ranStr()); // System.out.println(26); preparedStatement.setTimestamp(27, ed); // System.out.println(27); preparedStatement.setTimestamp(28, et); // System.out.println(28); preparedStatement.setInt(29, ranInt(1,10)); // System.out.println(29); preparedStatement.setInt(30, ranInt(1,10)); preparedStatement.setInt(31,ranInt(1,10)); // System.out.println(30); // System.out.println(index); preparedStatement.addBatch(); } preparedStatement.executeBatch(); preparedStatement.clearBatch(); System.out.println("Counter = "+counter); } } catch (Exception e) { e.printStackTrace(); } finally { try { connection.close(); } catch (SQLException e) { e.printStackTrace(); } } } private static String ranStr() { String generatedString = RandomStringUtils.randomAlphabetic(5); return generatedString; } private static int ranInt(int min, int max) { Random r = new Random(); return r.ints(min, (max + 1)).limit(1).findFirst().getAsInt(); } }
I have used random function to generate data.
Thanks
Created 06-19-2019 11:46 PM
Created on 06-20-2019 12:30 AM - edited 06-20-2019 12:31 AM
Hi @EricL ,
I don't know how this will work, I will get continuous data from network, the data is related to NMS. So you mean when I receive data from network , first I write it into file, then copy that file into hdfs. Then use LOAD to load data from that file into my table, is it ?
Created 06-19-2019 04:19 PM
Created 06-19-2019 11:24 PM
Yes,
I was unable to paste directly because of character limitation so I am sharing google drive link.
https://drive.google.com/drive/folders/1AMF1SzRh4AuHcosfvFxXjSo0W0mdzKAN?usp=sharing
Please let me know if more information is required.