<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to set orc.stripe.size value in hive table which stores hundreds of millions? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357430#M237587</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/101716"&gt;@octor&lt;/a&gt;&amp;nbsp;You can tewk the split sizes if you are using Tez.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;set tez.grouping.min-size=16777216;--16 MB min split&lt;/LI&gt;&lt;LI&gt;set tez.grouping.max-size=64000000;--64 GB max split&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Fri, 11 Nov 2022 08:40:44 GMT</pubDate>
    <dc:creator>asish</dc:creator>
    <dc:date>2022-11-11T08:40:44Z</dc:date>
    <item>
      <title>How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357314#M237549</link>
      <description>&lt;P&gt;I am trying to create several tables with hundreds of millions of columns.&lt;/P&gt;&lt;P&gt;Tables are created and data is added while changing the &lt;STRONG&gt;orc.stripe.size&lt;/STRONG&gt; value (64MB or 256MB) for each table because the size of each table is different. (If there is a lot of table data, an error occurred when setting 64MB, so the value was increased to 256MB.)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;However, if the &lt;STRONG&gt;orc.stripe.size&lt;/STRONG&gt; value is set large (ex. 256MB) for a table with a relatively small number of data, the following error occurs.&lt;BR /&gt;I don't know how to set this value and create it. Is there a way to set the value according to the table size?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="line number2 index1 alt1"&gt;SessionState: Vertex failed, vertexName=Map&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;1, vertexId=vertex_1666932355626_0180_236_01, diagnostics=[Task failed, taskId=task_1666932355626_0180_236_01_000000, diagnostics=[TaskAttempt&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;0&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;failed, info=[Error: Error&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;while&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;running task ( failure ) : attempt_1666932355626_0180_236_01_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.EOFException: Can't finish&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;byte&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;read from uncompressed stream DATA position:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;81920&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;length:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;81920&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;range:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;4&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;offset:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;65536&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;position:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;16384&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;limit:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;16384&lt;/DIV&gt;&lt;DIV class="line number3 index2 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)&lt;/DIV&gt;&lt;DIV class="line number4 index3 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)&lt;/DIV&gt;&lt;DIV class="line number5 index4 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)&lt;/DIV&gt;&lt;DIV class="line number6 index5 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)&lt;/DIV&gt;&lt;DIV class="line number7 index6 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)&lt;/DIV&gt;&lt;DIV class="line number8 index7 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at java.security.AccessController.doPrivileged(Native Method)&lt;/DIV&gt;&lt;DIV class="line number9 index8 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at javax.security.auth.Subject.doAs(Subject.java:422)&lt;/DIV&gt;&lt;DIV class="line number10 index9 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)&lt;/DIV&gt;&lt;DIV class="line number11 index10 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)&lt;/DIV&gt;&lt;DIV class="line number12 index11 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)&lt;/DIV&gt;&lt;DIV class="line number13 index12 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)&lt;/DIV&gt;&lt;DIV class="line number14 index13 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)&lt;/DIV&gt;&lt;DIV class="line number15 index14 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;/DIV&gt;&lt;DIV class="line number16 index15 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&lt;/DIV&gt;&lt;DIV class="line number17 index16 alt2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&lt;/DIV&gt;&lt;DIV class="line number18 index17 alt1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at java.lang.Thread.run(Thread.java:748)&lt;/DIV&gt;</description>
      <pubDate>Thu, 10 Nov 2022 04:38:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357314#M237549</guid>
      <dc:creator>octor</dc:creator>
      <dc:date>2022-11-10T04:38:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357359#M237565</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/101716"&gt;@octor&lt;/a&gt;&amp;nbsp;I see you have already raised&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/HIVE-26720" target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-26720&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;May I know why do you want to change stripe size?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Nov 2022 13:45:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357359#M237565</guid>
      <dc:creator>asish</dc:creator>
      <dc:date>2022-11-10T13:45:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357406#M237579</link>
      <description>&lt;P&gt;If the amount of table data is large (more than billions of rows), another error occurred when creating the table with the default stripe size of 64MB. So, when I increased it to 256MB, it was confirmed that it works normally.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2022 01:46:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357406#M237579</guid>
      <dc:creator>octor</dc:creator>
      <dc:date>2022-11-11T01:46:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357430#M237587</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/101716"&gt;@octor&lt;/a&gt;&amp;nbsp;You can tewk the split sizes if you are using Tez.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;set tez.grouping.min-size=16777216;--16 MB min split&lt;/LI&gt;&lt;LI&gt;set tez.grouping.max-size=64000000;--64 GB max split&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Fri, 11 Nov 2022 08:40:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357430#M237587</guid>
      <dc:creator>asish</dc:creator>
      <dc:date>2022-11-11T08:40:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357435#M237592</link>
      <description>&lt;P&gt;The current tez settings are as follows.&lt;/P&gt;&lt;P&gt;- tez.grouping.max-size=52428800; -- 50MB&lt;BR /&gt;- tez.grouping.max-size=1073741824; -- 1GB&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think the max-size you mentioned is 64M.&lt;/P&gt;&lt;P&gt;set tez.grouping.min-size=16777216;--16 MB min split&lt;BR /&gt;set tez.grouping.max-size=64000000;--64 GB max split -&amp;gt; 64MB&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From my experience while creating and adding data into tables, I think that if orc.stripe.size is set between 64MB and 256MB, i can create tables and add data smoothly. Can you tell me roughly what range I should take tez.grouping.min/max-size?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2022 09:13:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357435#M237592</guid>
      <dc:creator>octor</dc:creator>
      <dc:date>2022-11-11T09:13:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to set orc.stripe.size value in hive table which stores hundreds of millions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357447#M237597</link>
      <description>&lt;P&gt;You can set&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;tez.grouping.max-size to 1gb.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Please increase below&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;set hive.tez.container.size=10240;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;set tez.runtime.io.sort.mb=4096; (40% of container size)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2022 11:18:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-set-orc-stripe-size-value-in-hive-table-which-stores/m-p/357447#M237597</guid>
      <dc:creator>asish</dc:creator>
      <dc:date>2022-11-11T11:18:40Z</dc:date>
    </item>
  </channel>
</rss>

