<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Export command creating files under data folder in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388694#M246742</link>
    <description>&lt;P&gt;Command to dump the command :&amp;nbsp;export table test to '/jayesh/testdump1';&lt;/P&gt;</description>
    <pubDate>Mon, 03 Jun 2024 08:44:42 GMT</pubDate>
    <dc:creator>jayes</dc:creator>
    <dc:date>2024-06-03T08:44:42Z</dc:date>
    <item>
      <title>Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388494#M246693</link>
      <description>&lt;P&gt;&lt;BR /&gt;i am trying to check the files that are created under data folder when we export table to hdfs location.&lt;/P&gt;&lt;P&gt;I have single column table with 5000 rows but after export it created only one data file named as 000000_0&lt;BR /&gt;I have another table with partition, 2 partitions with 95 rows each approx. and for this table export is see 2 folders each for partition and then under it only one file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please help me to know how are the data files for the exported table are created and how is it distributed?&amp;nbsp;&lt;BR /&gt;I mean is it only one data file for each table without partitioned? or it will create multiple files based on specific row count or size count or on what criteria it is split?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2024 14:15:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388494#M246693</guid>
      <dc:creator>jayes</dc:creator>
      <dc:date>2024-05-29T14:15:37Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388611#M246718</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/107114"&gt;@jayes&lt;/a&gt;&amp;nbsp;May i know how you are exporting the table into HDFS? what is the command?&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2024 05:24:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388611#M246718</guid>
      <dc:creator>ChethanYM</dc:creator>
      <dc:date>2024-05-31T05:24:21Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388691#M246741</link>
      <description>&lt;P&gt;When exporting tables to HDFS in Hive, the creation and distribution of data files depend on several factors such as the table structure (partitioned or non-partitioned), the underlying storage format, and cluster configurations.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H3&gt;Non-Partitioned Tables&lt;/H3&gt;&lt;P&gt;For non-partitioned tables, typically, Hive will create a single data file if the data is small enough to fit within a single block of the HDFS file system. In your case, a table with 5000 rows and a single column is likely small enough that Hive writes it into one data file (e.g., 000000_0)&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H3&gt;Partitioned Tables&lt;/H3&gt;&lt;P&gt;For partitioned tables, Hive will create separate directories for each partition and within each partition directory, it will create data files. The number of files within each partition directory can depend on the size of the data and the settings of your Hive and HDFS configurations.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2024 06:45:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388691#M246741</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-06-03T06:45:43Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388694#M246742</link>
      <description>&lt;P&gt;Command to dump the command :&amp;nbsp;export table test to '/jayesh/testdump1';&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2024 08:44:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388694#M246742</guid>
      <dc:creator>jayes</dc:creator>
      <dc:date>2024-06-03T08:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388695#M246743</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Can you please help me with the size per file ? or some setting where the size is mentioned ?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2024 08:45:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388695#M246743</guid>
      <dc:creator>jayes</dc:creator>
      <dc:date>2024-06-03T08:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388752#M246758</link>
      <description>&lt;P&gt;The export table command automatically creates a distributed copy (distcp) job when the table location contains a large number of files. This improves efficiency for handling massive datasets.&lt;BR /&gt;&lt;BR /&gt;The size of the exported file will match the size of the table data. You can adjust the memory allocated to the distcp job mappers and reducers if needed to optimize performance for your specific data size&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 07:02:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388752#M246758</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-06-04T07:02:41Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388922#M246818</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Can we export the table data into multiple files via some command like numfile or anything ?&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jun 2024 09:53:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388922#M246818</guid>
      <dc:creator>jayes</dc:creator>
      <dc:date>2024-06-07T09:53:29Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388924#M246820</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/107114"&gt;@jayes&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Hive itself doesn't offer a built-in command like &lt;STRONG&gt;numfile&lt;/STRONG&gt; to directly export table data into a specific number of files.&lt;BR /&gt;&lt;BR /&gt;However, you can achieve the same using a couple of approaches:&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;1. Spark:&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;Read the hive table using sparkSql&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; df=spark.sql("select * from sample_table")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;if it's managed table use HWC session.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;In Apache Spark, you can control the number of partitions in a DataFrame using the repartition or coalesce methods. Using coalesce method set the number of partitions for the dataframe.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;coalesced_df = df.coalesce(5)&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;Write the data&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; coalesced_df.write.parquet("/tmp/coalesced_df")&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;Result&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[hive@node4 ~]$ hdfs dfs -ls -h /tmp/coalesced_df
Found 6 items
-rw-r--r--   3 hive supergroup          0 2024-06-07 10:49 /tmp/coalesced_df/_SUCCESS
-rw-r--r--   3 hive supergroup    135.0 M 2024-06-07 10:48 /tmp/coalesced_df/part-00000-2e6bf3b8-53fa-4a6e-957e-83769c72e780-c000.snappy.parquet
-rw-r--r--   3 hive supergroup    200.0 M 2024-06-07 10:48 /tmp/coalesced_df/part-00001-2e6bf3b8-53fa-4a6e-957e-83769c72e780-c000.snappy.parquet
-rw-r--r--   3 hive supergroup     68.9 M 2024-06-07 10:49 /tmp/coalesced_df/part-00002-2e6bf3b8-53fa-4a6e-957e-83769c72e780-c000.snappy.parquet
-rw-r--r--   3 hive supergroup    155.4 M 2024-06-07 10:49 /tmp/coalesced_df/part-00003-2e6bf3b8-53fa-4a6e-957e-83769c72e780-c000.snappy.parquet
-rw-r--r--   3 hive supergroup    132.9 M 2024-06-07 10:49 /tmp/coalesced_df/part-00004-2e6bf3b8-53fa-4a6e-957e-83769c72e780-c000.snappy.parquet
[hive@node4 ~]$&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 07 Jun 2024 11:12:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388924#M246820</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-06-07T11:12:29Z</dc:date>
    </item>
    <item>
      <title>Re: Export command creating files under data folder</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388925#M246821</link>
      <description>&lt;P&gt;2.&amp;nbsp;&amp;nbsp;An alternative is to write a script (e.g., Bash) that interacts with Hive and potentially your desired output format.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jun 2024 11:17:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Export-command-creating-files-under-data-folder/m-p/388925#M246821</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-06-07T11:17:24Z</dc:date>
    </item>
  </channel>
</rss>

