<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question The best approach to the thousands of small partitions in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134253#M96915</link>
    <description>&lt;P&gt;There is a lot of information how is necessary to avoid small files and a large number of partitions in Hive. But what if I can’t avoid them? &lt;/P&gt;&lt;P&gt;I have to store a Hive-table with 10 years of history data. It contains 3710 daily partitions at present day. Every partition is really small, from 80 to 15000 records. In csv format partitions vary from 25Kb to 10Mb. In ORC format partitions vary from 10Kb to 2Mb. Though I don’t think that ORC format would be effective for that small size.
Queries to this table usually include date or period of dates, so daily partition is preferred. &lt;/P&gt;&lt;P&gt;What would be optimal approach (in terms of performance) for such a large amount of small data?&lt;/P&gt;</description>
    <pubDate>Mon, 17 Oct 2016 15:14:00 GMT</pubDate>
    <dc:creator>aloha</dc:creator>
    <dc:date>2016-10-17T15:14:00Z</dc:date>
    <item>
      <title>The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134253#M96915</link>
      <description>&lt;P&gt;There is a lot of information how is necessary to avoid small files and a large number of partitions in Hive. But what if I can’t avoid them? &lt;/P&gt;&lt;P&gt;I have to store a Hive-table with 10 years of history data. It contains 3710 daily partitions at present day. Every partition is really small, from 80 to 15000 records. In csv format partitions vary from 25Kb to 10Mb. In ORC format partitions vary from 10Kb to 2Mb. Though I don’t think that ORC format would be effective for that small size.
Queries to this table usually include date or period of dates, so daily partition is preferred. &lt;/P&gt;&lt;P&gt;What would be optimal approach (in terms of performance) for such a large amount of small data?&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 15:14:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134253#M96915</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2016-10-17T15:14:00Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134254#M96916</link>
      <description>&lt;P&gt;@Alena Melnikova,&lt;/P&gt;&lt;P&gt;Following link would help.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/2517/maximum-hive-table-partitions-allowed-recommended.html" target="_blank"&gt;https://community.hortonworks.com/questions/2517/maximum-hive-table-partitions-allowed-recommended.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html" target="_blank"&gt;https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data" target="_blank"&gt;http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 17:21:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134254#M96916</guid>
      <dc:creator>Jagatheeshr</dc:creator>
      <dc:date>2016-10-17T17:21:24Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134255#M96917</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/189/jramakrishnan.html" nodeid="189"&gt;@jramakrishnan&lt;/A&gt; Thanks.&lt;/P&gt;&lt;P&gt;I have already read these links. There is no clear answer. What would be good file format for small partitions? csv, orc, smth else? HBase as an alternative metastore is fine, but my Hive 1.2.1 still uses MySQL.
It was an idea about generate hash using the date. I would be glad if someone explains this idea in details.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 18:36:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134255#M96917</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2016-10-17T18:36:25Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134256#M96918</link>
      <description>&lt;P&gt;The whole goal of having partitions is to allow Hive to limit the files it will have to look at in order to fulfill the SQL request you send into it.  On the other hand, you also clearly understand that having too many small files to look at is a performance/scalability drag.  With so few number of records for each day, I'd suggest partitioning at the month level (as a single string such as &lt;A rel="user" href="https://community.cloudera.com/users/194/jniemiec.html" nodeid="194"&gt;@Joseph Niemiec&lt;/A&gt; and &lt;A rel="user" href="https://community.cloudera.com/users/235/bpreachuk.html" nodeid="235"&gt;@bpreachuk&lt;/A&gt; suggest in their answers to &lt;A href="https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html)" target="_blank"&gt;https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html)&lt;/A&gt;.  &lt;/P&gt;&lt;P&gt;This will allow you to keep your "original" dates as a column and let the partition months be a new virtual column.  Of course, you'll need to train/explain to your query writers the benefit of using this virtual column of the partition name in the queries, but will then get the value of partitioning all while having 1/30th of the files and each of them being 30x bigger.&lt;/P&gt;&lt;P&gt;Good luck!&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 18:54:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134256#M96918</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2016-10-17T18:54:25Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134257#M96919</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/122/lmartin.html" nodeid="122"&gt;@Lester Martin&lt;/A&gt; Thank you,&lt;/P&gt;&lt;P&gt;I keep in reserve option with monthly partition (YYYY-MM). This complicates queries. But if it's the only way I'll have to use it.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 20:45:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134257#M96919</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2016-10-17T20:45:38Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134258#M96920</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2027/aloha.html" nodeid="2027"&gt;@Alena Melnikova&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Disclaimer: Without having any knowledge of the data&lt;/P&gt;&lt;P&gt;STEPS:&lt;/P&gt;&lt;P&gt;1. For such small sets of data I would partition by YEAR.&lt;/P&gt;&lt;P&gt;2. I would insert the data &lt;STRONG&gt;ordering by timestamp. (Use PIG if Hive is taking more time)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;3. Table structure:&lt;/P&gt;&lt;P&gt;....&lt;/P&gt;&lt;P&gt;PARTITIONED BY (&lt;/P&gt;&lt;P&gt;year STRING COMMENT '')&lt;/P&gt;&lt;P&gt;STORED AS ORC tblproperties("orc.compress"="ZLIB",&lt;/P&gt;&lt;P&gt;"orc.bloom.filter.columns"="time_stamp",&lt;/P&gt;&lt;P&gt;"orc.create.index"="true",&lt;/P&gt;&lt;P&gt;"orc.stripe.size"="268435456" &lt;/P&gt;&lt;P&gt;,"orc.row.index.stride"="12000",&lt;/P&gt;&lt;P&gt;"orc.compress.size"="262144" &lt;/P&gt;&lt;P&gt;);&lt;/P&gt;&lt;P&gt;4. &lt;STRONG&gt;Collect statistics&lt;/STRONG&gt; on table. &lt;/P&gt;&lt;P&gt;5. Set few config parameters in hive&lt;/P&gt;&lt;P&gt;set hive.optimize.index.filter=true;&lt;/P&gt;&lt;P&gt;
set hive.exec.orc.skip.corrupt.data=true; &lt;/P&gt;&lt;P&gt;set hive.vectorized.execution.enabled=true;&lt;/P&gt;&lt;P&gt;set hive.exec.compress.output=true; &lt;/P&gt;&lt;P&gt;set hive.execution.engine=tez; &lt;/P&gt;&lt;P&gt;set tez.am.container.reuse.enabled=TRUE; &lt;/P&gt;&lt;P&gt; set hive.compute.query.using.stats=true; &lt;/P&gt;&lt;P&gt;set stats.reliable=true;
set hive.cbo.enable=true; &lt;/P&gt;&lt;P&gt;set hive.optimize.sort.dynamic.partition=true; &lt;/P&gt;&lt;P&gt;set hive.optimize.ppd=true; &lt;/P&gt;&lt;P&gt;set hive.optimize.ppd.storage=true;&lt;/P&gt;&lt;P&gt;
set hive.merge.tezfiles=true; &lt;/P&gt;&lt;P&gt;set hive.hadoop.supports.splittable.combineinputformat=true;&lt;/P&gt;&lt;P&gt;
set mapreduce.map.speculative=true; &lt;/P&gt;&lt;P&gt;6. Query with YEAR extracted from timestamp (try regex_replace function in hive) and TIMESTAMP&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;DO not miss&lt;/STRONG&gt; any of the steps above and post us about the awesome results you get:)&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2016 01:48:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134258#M96920</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-10-18T01:48:26Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134259#M96921</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2027/aloha.html" nodeid="2027"&gt;@Alena Melnikova&lt;/A&gt; taking a second glance at my answer I feel that you would not need any partition at all, provided you can compact yearly data into 1 file. So 10 files in total. A bit more steps but would work as gracefully as the yearly partition.&lt;/P&gt;&lt;P&gt;So try out solution 1 and when it works, try out solution 2 and then pick one:)&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2016 01:54:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134259#M96921</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-10-18T01:54:30Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134260#M96922</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/61877/the-best-approach-to-the-thousands-of-small-partit.html#"&gt;@rbiswas&lt;/A&gt; Thank you! It's interesting idea. I'll test parititions by year (YYYY), by year and month as suggested &lt;A href="https://community.hortonworks.com/questions/61877/the-best-approach-to-the-thousands-of-small-partit.html#"&gt;@Lester Martin&lt;/A&gt; (YYYY-MM) and daily (YYYY-MM-DD). I'll share results here.&lt;/P&gt;&lt;P&gt;By the way, what would be difference between two of your approaches? Partitions by year and compact one year in one file gives the same 10 files. &lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2016 17:58:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134260#M96922</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2016-10-18T17:58:46Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134261#M96923</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2027/aloha.html" nodeid="2027"&gt;@Alena Melnikova&lt;/A&gt; you got it, there is no difference apart from a very subtle one. On approach one we still kept a bit of dependency on partition pruning and on approach 2 it is entirely dependent on ordering of data via ORC index.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Oct 2016 21:04:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134261#M96923</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-10-18T21:04:13Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134262#M96924</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902" target="_blank"&gt;@rbiswas&lt;/A&gt;, &lt;A rel="user" href="https://community.cloudera.com/users/122/lmartin.html" nodeid="122" target="_blank"&gt;@Lester Martin&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I tested
4 variants of partitioning on 6 queries:&lt;/P&gt;&lt;PRE&gt;Daily partitions (calday=2016-10-20)
Year-month partitions (year_month=2016-10)
Year partitions (year=2016)
No partitions (but 10 files with yearly data)&lt;/PRE&gt;&lt;P&gt;It was created 4 tables following &lt;A href="https://community.hortonworks.com/users/3902/rbiswas.html" rel="nofollow noopener noreferrer" target="_blank"&gt;&lt;/A&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902" target="_blank"&gt;@rbiswas&lt;/A&gt; recommendations.&lt;/P&gt;&lt;P&gt;Here is yearly aggregate information about data. Just to give you idea about scale of data.&lt;/P&gt;&lt;TABLE&gt;
 &lt;TBODY&gt;&lt;TR&gt;
  &lt;TD&gt;&lt;STRONG&gt;partition&lt;/STRONG&gt;&lt;/TD&gt;
  &lt;TD&gt;&lt;STRONG&gt;size&lt;/STRONG&gt;&lt;/TD&gt;
  &lt;TD&gt;&lt;STRONG&gt; records &lt;/STRONG&gt;&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2006&lt;/TD&gt;
  &lt;TD&gt;539.4
  K  &lt;/TD&gt;
  &lt;TD&gt;  12 217  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2007&lt;/TD&gt;
  &lt;TD&gt;2.8
  M  &lt;/TD&gt;
  &lt;TD&gt;   75 584 
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2008&lt;/TD&gt;
  &lt;TD&gt;6.4
  M  &lt;/TD&gt;
  &lt;TD&gt;  155 850  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2009&lt;/TD&gt;
  &lt;TD&gt;9.1
  M  &lt;/TD&gt;
  &lt;TD&gt;  228 247  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2010&lt;/TD&gt;
  &lt;TD&gt;9.3
  M  &lt;/TD&gt;
  &lt;TD&gt;  225 357  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2011&lt;/TD&gt;
  &lt;TD&gt;8.5
  M  &lt;/TD&gt;
  &lt;TD&gt;  196 280  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2012&lt;/TD&gt;
  &lt;TD&gt;19.5
  M  &lt;/TD&gt;
  &lt;TD&gt;   448 145 
  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2013&lt;/TD&gt;
  &lt;TD&gt;113.4
  M  &lt;/TD&gt;
  &lt;TD&gt;  2 494 787  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2014&lt;/TD&gt;
  &lt;TD&gt;196.7
  M  &lt;/TD&gt;
  &lt;TD&gt;  4 038 632  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2015&lt;/TD&gt;
  &lt;TD&gt;204.3
  M  &lt;/TD&gt;
  &lt;TD&gt;  4 047 002  &lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR&gt;
  &lt;TD&gt;year=2016&lt;/TD&gt;
  &lt;TD&gt;227.2
  M  &lt;/TD&gt;
  &lt;TD&gt;  4 363 214  &lt;/TD&gt;
 &lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I run every query 5 times, cast the worst/best results and took the
average of the remaining three. &lt;/P&gt;&lt;P&gt;The results are below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8736-hw-partitions.png" style="width: 719px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22129iE729D9991ACE8D0C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8736-hw-partitions.png" alt="8736-hw-partitions.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Obviously, daily partitioning
is the worst case. But it is not so clearly to the rest of the options. The
results depend on the query. In the end I decided that the yearly partitioning
in our case would be optimal. &lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902" target="_blank"&gt;@rbiswas&lt;/A&gt;, thanks for the idea!&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/61877/the-best-approach-to-the-thousands-of-small-partit.html#" rel="nofollow noopener noreferrer" target="_blank"&gt;@rbiswas&lt;/A&gt;, I have
couple of questions:&lt;/P&gt;&lt;P&gt;1. Given
that I have less than 10,000 records per day would it be better to set
&lt;STRONG&gt;orc.row.index.stride&lt;/STRONG&gt; less than 12000? &lt;/P&gt;&lt;P&gt;2.  In my table I have columns:&lt;/P&gt;&lt;PRE&gt;Order_date string (looks '2016-10-20'),
Order_time timestamp (looks '2016-10-20 12:45:55')&lt;/PRE&gt;&lt;P&gt;The
table is sorted by &lt;STRONG&gt;order_time&lt;/STRONG&gt; as you recommended and has a bloom filter index. But
filter &lt;/P&gt;&lt;PRE&gt;WHERE to_date(order_time) BETWEEN ... any period &lt;/PRE&gt;&lt;P&gt;works 15-20% slower than &lt;/P&gt;&lt;PRE&gt;WHERE order_date BETWEEN ... any period &lt;/PRE&gt;&lt;P&gt;Actually
I expected that using column with bloom filter speeds up query execution. Why it did not happen?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 09:03:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134262#M96924</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2019-08-19T09:03:29Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134263#M96925</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2027/aloha.html" nodeid="2027"&gt;@Alena Melnikova&lt;/A&gt; Good to hear that you are happy with the results:)&lt;/P&gt;&lt;P&gt;Answers:&lt;/P&gt;&lt;P&gt;1. You can go as low as 1k. Choose a balanced option on the average number of rows you query.&lt;/P&gt;&lt;P&gt;2. The usage of function to_date I believe will cause the orc index to stop working (Haven't tested that). Google "why function based index?"&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 21:07:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134263#M96925</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-10-21T21:07:12Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134264#M96926</link>
      <description>&lt;P&gt;got it, thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 22 Oct 2016 16:11:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134264#M96926</guid>
      <dc:creator>aloha</dc:creator>
      <dc:date>2016-10-22T16:11:31Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134265#M96927</link>
      <description>&lt;P&gt;Great job &lt;A rel="user" href="https://community.cloudera.com/users/2027/aloha.html" nodeid="2027"&gt;@Alena Melnikova&lt;/A&gt;! Nice work with the data and visualization.  Really helpful, confirms some longstanding assumptions I've had.  &lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2016 04:17:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134265#M96927</guid>
      <dc:creator>bpreachuk</dc:creator>
      <dc:date>2016-10-25T04:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: The best approach to the thousands of small partitions</title>
      <link>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134266#M96928</link>
      <description>&lt;P&gt;Hey everyone,&lt;BR /&gt;I have a somewhat similar question, which I posted here:&lt;BR /&gt;&lt;A href="https://community.hortonworks.com/questions/155681/how-to-defragment-hdfs-data.html" target="_blank"&gt;https://community.hortonworks.com/questions/155681/how-to-defragment-hdfs-data.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I would really appreciate any ideas.&lt;/P&gt;&lt;P&gt;cc &lt;A rel="user" href="https://community.cloudera.com/users/122/lmartin.html" nodeid="122"&gt;@Lester Martin&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/189/jramakrishnan.html" nodeid="189"&gt;@Jagatheesh Ramakrishnan&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902"&gt;@rbiswas&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 19:54:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/The-best-approach-to-the-thousands-of-small-partitions/m-p/134266#M96928</guid>
      <dc:creator>zack_riesland</dc:creator>
      <dc:date>2018-01-04T19:54:41Z</dc:date>
    </item>
  </channel>
</rss>

