Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive insert statement is very slow

Highlighted

Hive insert statement is very slow

Rising Star

Very basic insert statement in Hive

insert into car(model) 
 values ('Honda'), ('Toyota');

is taking 2-3 minutes to complete. Is it a normal speed for Hive or is it too slow?

I tried both Tez and MapReduce as execution engines, results were almost the same.

Should I expect much better performance (like 2-3 seconds instead of minutes) if increase computer memory? The example below was executed on a single node cluster where computer had 6 Gb.

6 REPLIES 6
Highlighted

Re: Hive insert statement is very slow

@Dmitry Otblesk could you please try running the query again int tez mode after setting this parameter in hive shell

set hive.tez.exec.print.summary=true;

and share the summary of execution.

Re: Hive insert statement is very slow

Rising Star

@Rajkumar Singh

I did what you suggested. Below is the output

INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into car(model) values (...('Toyota')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1482331426058_0001) INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Status: DAG finished successfully in 34.31 seconds INFO : INFO : METHOD DURATION(ms) INFO : parse 45 INFO : semanticAnalyze 3,573 INFO : TezBuildDag 1,482 INFO : TezSubmitToRunningDag 1,433 INFO : TotalPrepTime 51,046 INFO : VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS INFO : Map 1 1 0 0 14.26 9,290 226 2 0 INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 1 INFO : TOTAL_LAUNCHED_TASKS: 1 INFO : DATA_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 7040 INFO : AM_GC_TIME_MILLIS: 19 INFO : File System Counters: INFO : HDFS_BYTES_READ: 13 INFO : HDFS_BYTES_WRITTEN: 80 INFO : HDFS_READ_OPS: 4 INFO : HDFS_WRITE_OPS: 3 INFO : HDFS_OP_CREATE: 2 INFO : HDFS_OP_GET_FILE_STATUS: 3 INFO : HDFS_OP_OPEN: 1 INFO : HDFS_OP_RENAME: 1 INFO : org.apache.tez.common.counters.TaskCounter: INFO : GC_TIME_MILLIS: 226 INFO : CPU_MILLISECONDS: 9290 INFO : PHYSICAL_MEMORY_BYTES: 79691776 INFO : VIRTUAL_MEMORY_BYTES: 2038177792 INFO : COMMITTED_HEAP_BYTES: 79691776 INFO : INPUT_RECORDS_PROCESSED: 2 INFO : INPUT_SPLIT_LENGTH_BYTES: 13 INFO : OUTPUT_RECORDS: 0 INFO : HIVE: INFO : CREATED_FILES: 1 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 2 INFO : RECORDS_OUT_1_default.car: 2 INFO : TaskCounter_Map_1_INPUT_values__tmp__table__1: INFO : INPUT_RECORDS_PROCESSED: 2 INFO : INPUT_SPLIT_LENGTH_BYTES: 13 INFO : TaskCounter_Map_1_OUTPUT_out_Map_1: INFO : OUTPUT_RECORDS: 0 INFO : Loading data to table default.car from hdfs://lenu.dom.hdp:8020/apps/hive/warehouse/car/.hive-staging_hive_2016-12-21_11-19-49_470_4335548459844470170-1/-ext-10000 INFO : Table default.car stats: [numFiles=6, numRows=15, totalSize=103, rawDataSize=88] No rows affected (89.956 seconds)

Highlighted

Re: Hive insert statement is very slow

the counter suggest that dag submission took 34 secs while whole execution time was 51 sec, see if there was resource issue at RM side.

Highlighted

Re: Hive insert statement is very slow

Rising Star

@Rajkumar Singh

>see if there was resource issue at RM side.

How can I check that? What exactly should I look at?

Highlighted

Re: Hive insert statement is very slow

@Dmitry Otblesk please look into the RM UI after job submission.

Highlighted

Re: Hive insert statement is very slow

@Dmitry Otblesk

Login to Ambari UI first then click on YARN link on the left nav bar then on the QuickLinks and chose Resource Manager UI link. You could also go directly to your Resource Manager UI if you know the host where the Resource Manager service runs also the port.

You should also take advantage of Hive Tez View to see all the tasks executed and time needed for each. While you execute the query watch the execution in Resource Manager UI to understand number of containers per task, resource utilization etc. If you see that you have low degree of parallelism and still resources enough to instantiate more containers then you have an opportunity to adjust the query to allow more parallelism.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/ch_query_optim...

Change the path from the link to match your version of HDP.

Don't have an account?
Coming from Hortonworks? Activate your account here