<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205256#M68880</link>
    <description>&lt;P&gt;By using ConsumeKafkaRecord_0_10 with JsonTreeReader and an AvroRecordSetWriter Like Bryan suggested I now get a throughput of 9600 msg/sec on the cluster (4800 msg/sec on each machine).&lt;/P&gt;&lt;P&gt;I could not remove the MergeContent. If I do I get very small files cca. 0.5MB.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Wed, 04 Oct 2017 18:31:27 GMT</pubDate>
    <dc:creator>matej_puntar</dc:creator>
    <dc:date>2017-10-04T18:31:27Z</dc:date>
    <item>
      <title>NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205252#M68876</link>
      <description>&lt;P&gt;I am using a Nifi cluster of 2 x c4.2xlarge machines (8 cores and 15 GB memory each) &lt;BR /&gt;&lt;BR /&gt;Nifi is setup to use 12GB of memory&lt;BR /&gt;# JVM memory settings&lt;BR /&gt;java.arg.2=-Xms12g&lt;BR /&gt;java.arg.3=-Xmx12g&lt;BR /&gt;&lt;BR /&gt;jsonToAvro processor is running with 7 Concurrent Tasks and I get a throughput of 450 messages per second. Message size is about 3KB. The only slow part is the jsonToAvro processor. When running the workflow all cores are above 90%&lt;BR /&gt;&lt;BR /&gt;If I save data to file from kafka and use orc-tools to convert to ORC file I get a  throughput of 5000 msg/sec on one machine.&lt;BR /&gt;&lt;BR /&gt;I configured NiFi as instructed in the Best practices articel: &lt;A href="https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html" rel="nofollow noopener noreferrer" target="_blank"&gt;https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;What am I doing wrong?&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40623-jsontoavro-slow.png" style="width: 1880px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/17117iC995883AF69B37A3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40623-jsontoavro-slow.png" alt="40623-jsontoavro-slow.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:20:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205252#M68876</guid>
      <dc:creator>matej_puntar</dc:creator>
      <dc:date>2022-09-16T12:20:23Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205253#M68877</link>
      <description>&lt;P&gt;You could improve the performance significantly by using the record-oriented capabilities introduced in Apache NiFi 1.2.0...&lt;/P&gt;&lt;P&gt;You would use ConsumeKafkaRecord_0_10 with a JsonTreeReader and an AvroRecordSetWriter and set the batch size to something like 1000 (or more). This would produce 1 flow file coming out of ConsumeKafkaRecord_0_10 that already has the Avro records in it, then you could eliminate the need for ConvertJSONToAvro, and possibly eliminate MergeContent since you will already have a bunch of records in a flow file.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Oct 2017 19:50:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205253#M68877</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2017-10-03T19:50:41Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205254#M68878</link>
      <description>&lt;P&gt;Thank you for the suggestion. This looks very promising. I just need to figure out how the suggested components work. I will let you know how it goes. &lt;BR /&gt;Thank you. &lt;/P&gt;</description>
      <pubDate>Tue, 03 Oct 2017 20:43:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205254#M68878</guid>
      <dc:creator>matej_puntar</dc:creator>
      <dc:date>2017-10-03T20:43:37Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205255#M68879</link>
      <description>&lt;P&gt;just a schema is need to do those   very easy see&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html" target="_blank"&gt;https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/106450/record-oriented-data-with-nifi.html" target="_blank"&gt;https://community.hortonworks.com/articles/106450/record-oriented-data-with-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi" target="_blank"&gt;https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I used QueryRecord to convert so I could limit what I wanted to see:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html" target="_blank"&gt;https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html" target="_blank"&gt;https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Oct 2017 20:55:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205255#M68879</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2017-10-03T20:55:36Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205256#M68880</link>
      <description>&lt;P&gt;By using ConsumeKafkaRecord_0_10 with JsonTreeReader and an AvroRecordSetWriter Like Bryan suggested I now get a throughput of 9600 msg/sec on the cluster (4800 msg/sec on each machine).&lt;/P&gt;&lt;P&gt;I could not remove the MergeContent. If I do I get very small files cca. 0.5MB.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Wed, 04 Oct 2017 18:31:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205256#M68880</guid>
      <dc:creator>matej_puntar</dc:creator>
      <dc:date>2017-10-04T18:31:27Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi converting json from Kafka to columnar ORC files - jsonToAvro very slow</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205257#M68881</link>
      <description>&lt;P&gt;Processor ConvertAvroToORC was using only 2 Concurrent Tasks although it was configured to use 4. After restarting the cluster ConvertAvroToORC started using 4 Concurrent Tasks and the throughput is now 14600 msg/sec on the cluster (7300 msg/sec on each machine).&lt;/P&gt;</description>
      <pubDate>Wed, 04 Oct 2017 20:57:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-converting-json-from-Kafka-to-columnar-ORC-files/m-p/205257#M68881</guid>
      <dc:creator>matej_puntar</dc:creator>
      <dc:date>2017-10-04T20:57:07Z</dc:date>
    </item>
  </channel>
</rss>

