Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When inserting data query planning takes long time.

When inserting data query planning takes long time.

Explorer

Hi,

I am using latest Cloudera VM as on 17June2019, impala version 2.10.0 . I observed INSERT INTO is taking too much time for table which consist of 31 cloumns. Its taking more time in plan waiting .

 

I am posting my query profile-

 

 Query Info

    Query ID: b746ed90a692cef9:d7c51dd400000000
    User:
    Database: default
    Coordinator: quickstart.cloudera
    Query Type: DML
    Query State: FINISHED
    Start Time: Jun 19, 2019 1:07:23 PM
    End Time: Jun 19, 2019 1:08:07 PM
    Duration: 43.8s
    Rows Produced: 1,000
    Admission Result: Admitted immediately
    Admission Wait Time: 0ms
    Aggregate Peak Memory Usage: 389.0 KiB
    Client Fetch Wait Time: 25ms
    Client Fetch Wait Time Percentage: 0
    Estimated per Node Peak Memory: 10.0 MiB
    File Formats:
    HDFS Bytes Written: 160.0 KiB
    Impala Version: impalad version 2.10.0-cdh5.13.0 RELEASE (build 2511805f1eaa991df1460276c7e9f19d819cd4e4)
    Memory Accrual: 199,168 byte seconds
    Network Address: 172.16.10.72:49084
    Node with Peak Memory Usage: quickstart.cloudera:22000
    Out of Memory: false
    Per Node Peak Memory Usage: 389.0 KiB
    Planning Wait Time: 42.45s
    Planning Wait Time Percentage: 97
    Pool: root.default
    Query Status: OK
    Rows Inserted: 1,000
    Session ID: 794f64ac9b4b1553:14542a16b4511ba2
    Session Type: HIVESERVER2
    Statistics Corrupt: false
    Statistics Missing: false
    Threads: CPU Time: 108ms
    Threads: CPU Time Percentage: 100
    Threads: Network Receive Wait Time: 0ms
    Threads: Network Receive Wait Time Percentage: 0
    Threads: Network Send Wait Time: 0ms
    Threads: Network Send Wait Time Percentage: 0
    Threads: Storage Wait Time: 0ms
    Threads: Storage Wait Time Percentage: 0
    Threads: Total Time: 108ms




 Query Timeline

    Query submitted: 86.45us (86.45us)
    Planning finished: 42.45s (42.45s)
    Submit for admission: 42.47s (21ms)
    Completed admission: 42.47s (135.69us)
    Ready to start on 1 backends: 42.48s (1ms)
    All 1 execution backends (1 fragment instances) started: 42.55s (74ms)
    DML data written: 43.37s (822ms)
    DML Metastore update finished: 43.77s (395ms)
    Request finished: 43.79s (20ms)
    Unregister query: 43.81s (25ms)

Planner Timeline

    Analysis finished: 41.71s (41.71s)
    Equivalence classes computed: 41.72s (6ms)
    Single node plan created: 41.87s (155ms)
    Runtime filters computed: 41.87s (107.52us)
    Distributed plan created: 41.88s (8ms)
    Lineage info computed: 42.09s (214ms)
    Planning finished: 42.27s (179ms)



What could be the reason and how can I solve this ?

Thanks

8 REPLIES 8

Re: When inserting data query planning takes long time.

Master Collaborator

Do you have an example of the query? Does it have a lot of VALUES or a lot of columns?

Re: When inserting data query planning takes long time.

Explorer

It has only 31 columns consisting datatype int, string and timestamp. I had created batch of mere 1k records using Prepared Statement in java, I am using latest jdbcdriver41. 

Following query I have used to create table-

 

create TABLE IF NOT EXISTS tbl_mindarray (source_ip string,destination_ip string,protocol_number int,source_port int,destination_port int,packet bigint,volume bigint,duration bigint,pps bigint,bps bigint,bpp bigint,source_latitude string,source_longitude string,source_city string,source_country string,destination_latitude string,destination_longitude string,destination_city string,destination_country string,ingress_volume bigint,egress_volume bigint,ingress_packet bigint,egress_packet bigint,source_if_index int,destination_if_index int,source_host string,event_date timestamp,event_time timestamp,_time int,flow bigint) partitioned by (year int) stored as parquet;

 

Re: When inserting data query planning takes long time.

Guru
So you use Prepare statement in Java code to write data, can you please share the full INSERT query that was actually run against Impala? This can be found from the profile.

Like Tim mentioned earlier, he wanted to see if the query generated has 1K VALUES statement in there.

Cheers
Eric

Re: When inserting data query planning takes long time.

Explorer

Hi,

I am posting my entire java code here.

 

import com.cloudera.impala.jdbc41.internal.com.cloudera.altus.shaded.org.apache.commons.lang3.RandomStringUtils;

import java.sql.*;
import java.util.Random;

class impala12 {

    private static final String CONNECTION_URL = "jdbc:impala://172.16.8.177:21050;SCRATCH_LIMIT=-1" ;
    private static final String sqlStatementCreate = "CREATE TABLE if not exists helloworld (message String) STORED AS PARQUET";
    private static final String sqlStatementInsert = "INSERT INTO helloworld VALUES (\"helloworld\")";

    private static final String sqlCompiledQuery = "INSERT INTO tbl_mindarray (source_ip,destination_ip,protocol_number," +
            "source_port,destination_port,packet,volume,duration,pps,bps,bpp,source_latitude,source_longitude," +
            "source_city,source_country,destination_latitude,destination_longitude ,destination_city ,destination_country ," +
            "ingress_volume ,egress_volume ,ingress_packet ,egress_packet ,source_if_index ,destination_if_index," +
            "source_host,event_date,event_time,_time,flow,year)" + " VALUES" +
            "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
            ;
 
    public static void main(String[] args) {
        System.out.println("Impala using Java");

        writeInABatchWithCompiledQuery(1000);

        System.out.println("Done");

    }


    private static Connection connectViaDS() throws Exception {
        Connection connection = null;

        Class.forName("com.cloudera.impala.jdbc41.Driver");

        connection = DriverManager.getConnection(CONNECTION_URL);

        return connection;

    }

    private static void writeInABatchWithCompiledQuery(int records) {

        long e_date= 1275822966, e_time= 1370517366;

        PreparedStatement preparedStatement;

        // int total = 1000000*1000;

        int total = 1000*1000;

        int counter =0;

        Connection connection = null;
        try {
            connection = connectViaDS();

            connection.setAutoCommit(false);

            preparedStatement = connection.prepareStatement(sqlCompiledQuery);


            Timestamp ed = new Timestamp(e_date);
            Timestamp et = new Timestamp(e_time);

            while(counter <total) {

                for (int index = 1; index <= records; index++) {

                    counter++;

                    preparedStatement.setString(1, ranStr());
                    //   System.out.println(1);
                    preparedStatement.setString(2, ranStr());
                    //   System.out.println(2);
                    preparedStatement.setInt(3, ranInt(1,10));
                    //   System.out.println(3);
                    preparedStatement.setInt(4, ranInt(1,10));
                    //  System.out.println(4);
                    preparedStatement.setInt(5, ranInt(1,10));
                    //  System.out.println(5);
                    preparedStatement.setInt(6, ranInt(1,10));
                    //  System.out.println(6);
                    preparedStatement.setInt(7, ranInt(1,10));
                    //  System.out.println(7);
                    preparedStatement.setInt(8, ranInt(1,10));
                    //   System.out.println(8);
                    preparedStatement.setInt(9, ranInt(1,10));
                    //  System.out.println(9);
                    preparedStatement.setInt(10, ranInt(1,10));
                    //  System.out.println(10);
                    preparedStatement.setInt(11, ranInt(1,10));
                    //   System.out.println(11);
                    preparedStatement.setString(12, ranStr());
                    //  System.out.println(12);
                    preparedStatement.setString(13, ranStr());
                    //  System.out.println(13);
                    preparedStatement.setString(14, ranStr());
                    //  System.out.println(14);
                    preparedStatement.setString(15, ranStr());
                    //  System.out.println(15);
                    preparedStatement.setString(16, ranStr());
                    //   System.out.println(16);
                    preparedStatement.setString(17, ranStr());
                    //   System.out.println(17);
                    preparedStatement.setString(18, ranStr());
                    //   System.out.println(18);
                    preparedStatement.setString(19, ranStr());
                    //   System.out.println(19);
                    preparedStatement.setInt(20, ranInt(1,10));
                    //   System.out.println(20);
                    preparedStatement.setInt(21, ranInt(1,10));
                    //   System.out.println(21);
                    preparedStatement.setInt(22, ranInt(1,10));
                    //   System.out.println(22);
                    preparedStatement.setInt(23,ranInt(1,10));
                    //   System.out.println(23);
                    preparedStatement.setInt(24, ranInt(1,10));
                    //   System.out.println(24);
                    preparedStatement.setInt(25, ranInt(1,10));
                    //    System.out.println(25);
                    preparedStatement.setString(26, ranStr());
                    //   System.out.println(26);
                    preparedStatement.setTimestamp(27, ed);
                    //   System.out.println(27);
                    preparedStatement.setTimestamp(28, et);
                    //   System.out.println(28);
                    preparedStatement.setInt(29, ranInt(1,10));
                    //   System.out.println(29);
                    preparedStatement.setInt(30, ranInt(1,10));

                    preparedStatement.setInt(31,ranInt(1,10));
                    //   System.out.println(30);
                    // System.out.println(index);
                    preparedStatement.addBatch();

                }
                preparedStatement.executeBatch();

                preparedStatement.clearBatch();

                System.out.println("Counter = "+counter);


            }


        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                connection.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }

    }

    private static String ranStr() {
        String generatedString = RandomStringUtils.randomAlphabetic(5);

        return generatedString;
    }

    private static int ranInt(int min, int max) {

        Random r = new Random();
        return r.ints(min, (max + 1)).limit(1).findFirst().getAsInt();

    }

}


I have used random function to generate data.

Thanks

Re: When inserting data query planning takes long time.

Guru
@punshi,

Thanks for sharing the profile, I can see your query has 394043 characters. This will take sometime to parse. It will be much more efficient if you generate the data in your application into a file, load it into HDFS and then create an Impala table on top of it.

With such large query string, it is expected to take sometime to analyze.

Cheers
Eric

Re: When inserting data query planning takes long time.

Explorer

Hi @EricL ,

I don't know how this will work, I will get continuous data from network, the data is related to NMS. So you mean when I receive data from network , first I write it into file, then copy that file into hdfs. Then use LOAD to load data from that file into my table, is it ?

Re: When inserting data query planning takes long time.

Guru
Please also share the query planning section from PROFILE as well in case it might be useful.

Cheers
Eric

Re: When inserting data query planning takes long time.

Explorer

Yes, 

 

I was unable to paste directly because of character limitation so I am sharing google drive link.

 

https://drive.google.com/drive/folders/1AMF1SzRh4AuHcosfvFxXjSo0W0mdzKAN?usp=sharing

 

Please let me know if more information is required.

Don't have an account?
Coming from Hortonworks? Activate your account here