Reply
Highlighted
Explorer
Posts: 13
Registered: ‎08-08-2017
Accepted Solution

[Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

Hello!!

 

I'm doing a ETL Process using Pentaho DI and loading data into my Cloudera Impala cluster. The point is that when I do the load with a big amount of data (are lil bit more than 34K rows) I get this GC Error. Previously I tried the load with fake data (10K rows) and it worked fine without any problem. The error I'm getting in Pentaho DI is the following one:

2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:385)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at java.lang.Thread.run(Thread.java:748)
2017/08/30 07:03:28 - Load to Impala Person.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 
2017/08/30 07:03:28 - Load to Impala Person.0 - Error inserting/updating row
2017/08/30 07:03:28 - Load to Impala Person.0 - OutOfMemoryError: GC overhead limit exceeded

 

While if I watch the Impalad logs, I have an impalad.ERROR and impalad.WARNING. The content of both are the following ones:

 

impalad.WARNING

 

W0829 18:06:21.070214 22994 DFSOutputStream.java:954] Caught exception 
Java exception follows:
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1252)
	at java.lang.Thread.join(Thread.java:1326)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
E0830 07:03:29.317476 19425 client-request-state.cc:940] ERROR Finalizing DML: OutOfMemoryError: GC overhead limit exceeded

impalad.ERROR

Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0829 13:49:00.567322  3084 logging.cc:124] stderr will be logged to this file.
E0830 07:03:29.317476 19425 client-request-state.cc:940] ERROR Finalizing DML: OutOfMemoryError: GC overhead limit exceeded

 

So anybody have any idea on what's happening or have a similar error? Help would be so appreciated.

Thanks you so much in advance.


Jose.

Explorer
Posts: 24
Registered: ‎06-13-2017

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

hard to tell based on the information you provided but see if you can increase Pentaho's memory settings (edit spoon.bat).

 

If that doesn't work, check Impala's catalog'd memory setting.

 

Hope this helps.

 

 

Explorer
Posts: 13
Registered: ‎08-08-2017

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

Thanks for your reply @vanhalen, do you know where I could find more information abouth this? If you know I could provide more information.

 

By the moment what I tried (based on what I watched in another similar topic) is increasing Impala's heapsize  and I'm trying to execute the ETL again. If this doesn't work I will try to increase Pentaho's memory setting, but I think the problem is in Impala rather than Pentaho.


Thanks you so much once again.

Champion
Posts: 543
Registered: ‎05-16-2016

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

@josholsan

 

I aint expert in this area but i know that you can change them in the startup script .

could you refer the link .

 

 

PENTAHO_DI_JAVA_OPTIONS

 

https://help.pentaho.com/Documentation/5.2/0P0/100/090/040

Explorer
Posts: 13
Registered: ‎08-08-2017

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

Thanks you too so much @csguna that looks interesting and may be a possible solution. When my current execution finish, I will try to do this.

 

I will come back and comment my results!!

 

Champion
Posts: 543
Registered: ‎05-16-2016

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

@josholsan Sure thing :)

Explorer
Posts: 13
Registered: ‎08-08-2017

Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data

Finally I tested your solution and it worked for me!

I'm going to mark your answer as solution.

 

Thanks you so much :D

 

Jose.

 

 

Announcements