<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can Kafka process multiple files? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223726#M185591</link>
    <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/1198/koenigbodensee.html" nodeid="1198"&gt;@Gerd Koenig&lt;/A&gt;  ! &lt;/P&gt;&lt;P&gt;For multiple files processing what application/tech should you recommend, process in realtime?&lt;/P&gt;</description>
    <pubDate>Tue, 27 Jun 2017 13:35:26 GMT</pubDate>
    <dc:creator>melvinmendoza</dc:creator>
    <dc:date>2017-06-27T13:35:26Z</dc:date>
    <item>
      <title>Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223722#M185587</link>
      <description>&lt;P&gt;Can kafka process multiple files and then send it to spark streaming?&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2017 07:23:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223722#M185587</guid>
      <dc:creator>melvinmendoza</dc:creator>
      <dc:date>2017-06-22T07:23:31Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223723#M185588</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14536/melvin-camendoza46.html" nodeid="14536"&gt;@mel mendoza&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Kafka is a message broker so it only receives files/events from publishers and makes them available for consumption by consumers.  It does not do any processing.&lt;/P&gt;&lt;P&gt;Spark streaming would dictate how files/events are read.  Since Spark Streaming does micro-batching it will read several files/events from Kafka and process them together in a micro-batch.&lt;/P&gt;&lt;P&gt;I believe this will achieve what you are asking to do, it'll be on the Spark side though, not Kafka.&lt;/P&gt;&lt;P&gt;As always, if you find this post helpful, don't forget to "accept" answer. &lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2017 07:47:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223723#M185588</guid>
      <dc:creator>egarelnabi</dc:creator>
      <dc:date>2017-06-22T07:47:21Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223724#M185589</link>
      <description>&lt;P&gt;Hello &lt;A rel="user" href="https://community.cloudera.com/users/14536/melvin-camendoza46.html" nodeid="14536"&gt;@mel mendoza&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;Kafka is basically not a file based systems, but event based. If you want to process files with Spark-Streaming via Kafka you have a 2-step approach. First is ingest to Kafka, then consume the events from Kafka by Spark-Streaming.&lt;/P&gt;&lt;P&gt;To ingest into Kafka you can e.g. use Kafka-Connect with the file source (check /usr/hdp/current/kafka-broker/conf/connect-file-source.properties). It works like a "tail -f " on that file and streams any incoming data from that file to the Kafka topic.&lt;/P&gt;&lt;P&gt;Afterwards you have to consume the events from that Kafka topic with your Spark-Streaming job.&lt;/P&gt;&lt;P&gt;HTH, Gerd&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2017 13:47:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223724#M185589</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2017-06-23T13:47:36Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223725#M185590</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/445/egarelnabi.html" nodeid="445"&gt;@Eyad Garelnabi
&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Meaning, I should go straight to Spark to process multiple files. &lt;/P&gt;</description>
      <pubDate>Tue, 27 Jun 2017 10:25:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223725#M185590</guid>
      <dc:creator>melvinmendoza</dc:creator>
      <dc:date>2017-06-27T10:25:13Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223726#M185591</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/1198/koenigbodensee.html" nodeid="1198"&gt;@Gerd Koenig&lt;/A&gt;  ! &lt;/P&gt;&lt;P&gt;For multiple files processing what application/tech should you recommend, process in realtime?&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jun 2017 13:35:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223726#M185591</guid>
      <dc:creator>melvinmendoza</dc:creator>
      <dc:date>2017-06-27T13:35:26Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223727#M185592</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/14536/melvin-camendoza46.html" nodeid="14536"&gt;@mel mendoza&lt;/A&gt; ,&lt;/P&gt;&lt;P&gt;maybe it is worth checking Flume to ingest multiple files to Kafka. Alternatively you can use HDF (particularly NiFi) to do so.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jun 2017 14:12:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223727#M185592</guid>
      <dc:creator>geko</dc:creator>
      <dc:date>2017-06-27T14:12:25Z</dc:date>
    </item>
    <item>
      <title>Re: Can Kafka process multiple files?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223728#M185593</link>
      <description>&lt;P&gt;Thanks again! I'm currently using NiFi for data collection. will try NiFi to kafka&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jun 2017 15:17:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-Kafka-process-multiple-files/m-p/223728#M185593</guid>
      <dc:creator>melvinmendoza</dc:creator>
      <dc:date>2017-06-27T15:17:47Z</dc:date>
    </item>
  </channel>
</rss>

