Reply
Master
Posts: 377
Registered: ‎07-01-2015

Slow inserts in Impala - DML Metastore update

Hi,

 we have a very very slow DML Metastore update on simple insert queries into HDFS table (select constant queries).  I am not sure if we are hitting https://issues.apache.org/jira/browse/IMPALA-1480, because the table is not partitioned, however I suspect that the number of files under the table can cause the issue (100k+)

 

After rebuilding the table the DML queries are running fine, is this a known limitation of Impala or a bug?

Thanks

 

 

----------------
Max Per-Host Resource Reservation: Memory=0B
Per-Host Resource Estimates: Memory=10.00MB
Codegen disabled by planner

F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=48B mem-reservation=0B
WRITE TO HDFS [base.import_processed, OVERWRITE=false]
|  partitions=1
|  mem-estimate=48B mem-reservation=0B
|
00:UNION
   constant-operands=1
   mem-estimate=0B mem-reservation=0B
   tuple-ids=0 row-size=48B cardinality=1
----------------

 

 

Query Timeline

  1. Query submitted: 0ns (0ns)
  2. Planning finished: 1ms (1ms)
  3. Submit for admission: 2ms (1ms)
  4. Queued: 2ms (0ns)
  5. Completed admission: 28.64s (28.64s)
  6. Ready to start on 1 backends: 28.64s (1ms)
  7. All 1 execution backends (1 fragment instances) started: 28.66s (12ms)
  8. DML data written: 28.88s (221ms)
  9. DML Metastore update finished: 3.8m (3.3m)
  10. Request finished: 3.8m (0ns)
  11. Unregister query: 3.8m (28ms)
Posts: 519
Topics: 14
Kudos: 90
Solutions: 45
Registered: ‎09-02-2016

Re: Slow inserts in Impala - DML Metastore update

@Tomas79

 

Please increase the below parameter value as needed and try again, it may help you

 

Java Heap Size of Catalog Server in Bytes

Master
Posts: 377
Registered: ‎07-01-2015

Re: Slow inserts in Impala - DML Metastore update

The java heap size of the Catalog was not the issue, so I am not sure about marking this as a solution. Unfortunately I dont have a time now to reproduce - because it would require to create a table with 100k+ files, each file having just one row. (This is a logging table in our environment)
Announcements