<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Nifi: merge files from local fs and put to hdfs in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209825#M71876</link>
    <description>&lt;P&gt;I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark.  I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43714-screenshot.png" style="width: 806px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16781i350972B20371E70F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43714-screenshot.png" alt="43714-screenshot.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 04:13:01 GMT</pubDate>
    <dc:creator>rfatkullin</dc:creator>
    <dc:date>2019-08-18T04:13:01Z</dc:date>
    <item>
      <title>Nifi: merge files from local fs and put to hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209825#M71876</link>
      <description>&lt;P&gt;I need to store a lot of small files (files have different types) in HDFS so it can be possible to process those data with Spark.  I chose Hadoop Sequence File type to store in HDFS. Nifi was chosen to merge, convert and put to HDFS. I found out how to load files, convert them to Sequence File, but I have stuck at merge stage. How I can merge several small Sequence Files to one bigger? MergeContent processor just merge content without handling Hadoop Sequence File structure. My Nifi project screenshot is in attachment.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43714-screenshot.png" style="width: 806px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16781i350972B20371E70F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43714-screenshot.png" alt="43714-screenshot.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 04:13:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209825#M71876</guid>
      <dc:creator>rfatkullin</dc:creator>
      <dc:date>2019-08-18T04:13:01Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: merge files from local fs and put to hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209826#M71877</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/48448/rfatkullin.html" nodeid="48448"&gt;@Rustam Fatkullin&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can use MergeContent before the CreateHadoopSequenceFile. If you have several types and you want to store them in separated files use RouteOnAttribute before.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.4.0/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2017 00:38:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209826#M71877</guid>
      <dc:creator>ahadjidj</dc:creator>
      <dc:date>2017-11-28T00:38:28Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: merge files from local fs and put to hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209827#M71878</link>
      <description>&lt;P&gt;Hello &lt;A rel="user" href="https://community.cloudera.com/users/2056/ahadjidj.html" nodeid="2056"&gt;@Abdelkrim Hadjidj&lt;/A&gt;&lt;/P&gt;&lt;P&gt;There are a lot of types of files. For example, png, bmp, pdf and etc. And i think it is bad idea, for example, to merge two pdf files. I think that Sequence Files were developed to store small files effectively. It is strange that CreateHadoopSequenceFile processor does not have capabilities to accumulate small files to bigger file. &lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2017 01:02:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209827#M71878</guid>
      <dc:creator>rfatkullin</dc:creator>
      <dc:date>2017-11-28T01:02:53Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: merge files from local fs and put to hdfs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209828#M71879</link>
      <description>&lt;P&gt;Sorry. I have read MergeContent documentation and realized my mistake. Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2017 01:26:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-merge-files-from-local-fs-and-put-to-hdfs/m-p/209828#M71879</guid>
      <dc:creator>rfatkullin</dc:creator>
      <dc:date>2017-11-28T01:26:01Z</dc:date>
    </item>
  </channel>
</rss>

