<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59418#M67442</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/23296"&gt;@josholsan&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I aint expert in this area but i know that you can change them in the startup script .&lt;/P&gt;&lt;P&gt;could you refer the link .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;PENTAHO_DI_JAVA_OPTIONS&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://help.pentaho.com/Documentation/5.2/0P0/100/090/040" target="_self"&gt;https://help.pentaho.com/Documentation/5.2/0P0/100/090/040&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 30 Aug 2017 08:39:56 GMT</pubDate>
    <dc:creator>csguna</dc:creator>
    <dc:date>2017-08-30T08:39:56Z</dc:date>
    <item>
      <title>[Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59408#M67439</link>
      <description>&lt;P&gt;Hello!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm doing a ETL Process using Pentaho DI and loading data into my Cloudera Impala cluster. The point is that when I do the load with a big amount of data (are lil bit more than 34K rows) I get this GC Error. Previously I tried the load with fake data (10K rows) and it worked fine without any problem. The error I'm getting in Pentaho DI is the following one:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:385)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2017/08/30 07:03:28 - Load to Impala Person.0 - 	at java.lang.Thread.run(Thread.java:748)
2017/08/30 07:03:28 - Load to Impala Person.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 
2017/08/30 07:03:28 - Load to Impala Person.0 - Error inserting/updating row
2017/08/30 07:03:28 - Load to Impala Person.0 - OutOfMemoryError: GC overhead limit exceeded&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While if I watch the Impalad logs, I have an impalad.ERROR and impalad.WARNING. The content of both are the following ones:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;impalad.WARNING&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;W0829 18:06:21.070214 22994 DFSOutputStream.java:954] Caught exception 
Java exception follows:
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1252)
	at java.lang.Thread.join(Thread.java:1326)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
E0830 07:03:29.317476 19425 client-request-state.cc:940] ERROR Finalizing DML: OutOfMemoryError: GC overhead limit exceeded&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;impalad.ERROR&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0829 13:49:00.567322  3084 logging.cc:124] stderr will be logged to this file.
E0830 07:03:29.317476 19425 client-request-state.cc:940] ERROR Finalizing DML: OutOfMemoryError: GC overhead limit exceeded&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So anybody have any idea on what's happening or have a similar error? Help would be so appreciated.&lt;BR /&gt;&lt;BR /&gt;Thanks you so much in advance.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Jose.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 13:45:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59408#M67439</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2026-04-21T13:45:52Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59415#M67440</link>
      <description>&lt;P&gt;hard to tell based on the information you provided but see if you can increase Pentaho's memory settings (edit spoon.bat).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If that doesn't work, check Impala's catalog'd memory setting.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 07:28:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59415#M67440</guid>
      <dc:creator>vanhalen</dc:creator>
      <dc:date>2017-08-30T07:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59416#M67441</link>
      <description>&lt;P&gt;Thanks for your reply &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/22513"&gt;@vanhalen&lt;/a&gt;, do you know where I could find more information abouth this? If you know I could provide more information.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the moment what I tried (based on what I watched in another similar topic) is increasing Impala's heapsize&amp;nbsp; and I'm trying to execute the ETL again. If this doesn't work I will try to increase Pentaho's memory setting, but I think the problem is in Impala rather than Pentaho.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks you so much once again.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 07:33:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59416#M67441</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-08-30T07:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59418#M67442</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/23296"&gt;@josholsan&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I aint expert in this area but i know that you can change them in the startup script .&lt;/P&gt;&lt;P&gt;could you refer the link .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;PENTAHO_DI_JAVA_OPTIONS&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://help.pentaho.com/Documentation/5.2/0P0/100/090/040" target="_self"&gt;https://help.pentaho.com/Documentation/5.2/0P0/100/090/040&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 08:39:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59418#M67442</guid>
      <dc:creator>csguna</dc:creator>
      <dc:date>2017-08-30T08:39:56Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59419#M67443</link>
      <description>&lt;P&gt;Thanks you too so much &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/16544"&gt;@csguna&lt;/a&gt; that looks interesting and may be a possible solution. When my current execution finish, I will try to do this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I will come back and comment my results!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 08:47:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59419#M67443</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-08-30T08:47:30Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59420#M67444</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/23296"&gt;@josholsan&lt;/a&gt;&amp;nbsp;Sure thing &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 08:54:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59420#M67444</guid>
      <dc:creator>csguna</dc:creator>
      <dc:date>2017-08-30T08:54:11Z</dc:date>
    </item>
    <item>
      <title>Re: [Impala] - GC overhead limit exceeded error in Impala while loading big amount of data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59495#M67445</link>
      <description>&lt;P&gt;Finally I tested your solution and it worked for me!&lt;/P&gt;&lt;P&gt;I'm going to mark your answer as solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks you so much &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jose.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Sep 2017 09:26:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-GC-overhead-limit-exceeded-error-in-Impala-while/m-p/59495#M67445</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-09-01T09:26:29Z</dc:date>
    </item>
  </channel>
</rss>

