<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive writing many small csv files to HDFS in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176731#M80186</link>
    <description>&lt;P&gt;That works, thank you.&lt;/P&gt;</description>
    <pubDate>Tue, 03 Jul 2018 20:24:48 GMT</pubDate>
    <dc:creator>siddarth_wardha</dc:creator>
    <dc:date>2018-07-03T20:24:48Z</dc:date>
    <item>
      <title>Hive writing many small csv files to HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176729#M80184</link>
      <description>&lt;P&gt;I am exporting Hive table data to csv files in HDFS using such queries&lt;/P&gt;&lt;PRE&gt;FROM Table T1 INSERT OVERWRITE DIRECTORY '&amp;lt;HDFS Directory&amp;gt;' SELECT *;&lt;/PRE&gt;&lt;P&gt;Hive is writing many small csv files(1-2MB) to the destination directory.&lt;/P&gt;&lt;P&gt;Is there a way to control the number of files or the size of csv files?&lt;/P&gt;&lt;P&gt;Note:&lt;/P&gt;&lt;P&gt;1) These csv files are not used for creating tables out of them so cannot replace the query with INSERT INTO TABLE...&lt;/P&gt;&lt;P&gt;2) Already tried these setting values to no avail&lt;/P&gt;&lt;PRE&gt;hive.merge.mapfiles=true;
hive.merge.mapredfiles
hive.merge.smallfiles.avgsize
hive.merge.size.per.task
mapred.max.split.size
mapred.min.split.size;&lt;/PRE&gt;&lt;P&gt;TIA&lt;/P&gt;&lt;P&gt;I have many tables in Hive with varying size. Some are very large and some are small. I am fine if for large tables many files are generated till each file is larger than 16 MB. I don't want to explicitly set the number of mappers because that will hamper query performance for large tables.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2018 15:05:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176729#M80184</guid>
      <dc:creator>siddarth_wardha</dc:creator>
      <dc:date>2018-07-03T15:05:29Z</dc:date>
    </item>
    <item>
      <title>Re: Hive writing many small csv files to HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176730#M80185</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/62176/siddarthwardhan.html" nodeid="62176"&gt;@Siddarth Wardhan&lt;/A&gt;&lt;P&gt;If you are using Tez as execution engine, then you need to set below properties:&lt;/P&gt;&lt;PRE&gt;set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.merge.size.per.task=128000000;&lt;/PRE&gt;</description>
      <pubDate>Tue, 03 Jul 2018 15:42:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176730#M80185</guid>
      <dc:creator>ssubhas</dc:creator>
      <dc:date>2018-07-03T15:42:46Z</dc:date>
    </item>
    <item>
      <title>Re: Hive writing many small csv files to HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176731#M80186</link>
      <description>&lt;P&gt;That works, thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2018 20:24:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-writing-many-small-csv-files-to-HDFS/m-p/176731#M80186</guid>
      <dc:creator>siddarth_wardha</dc:creator>
      <dc:date>2018-07-03T20:24:48Z</dc:date>
    </item>
  </channel>
</rss>

