<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question using nifi for iterative json parsing from a given http stream in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/using-nifi-for-iterative-json-parsing-from-a-given-http/m-p/386389#M246028</link>
    <description>&lt;P&gt;I have a url endpoint that provides a json file with a size of a couple of `GB`. Unfortunately the api does not support pagination which would be the normal approach to my problem.&lt;/P&gt;&lt;P&gt;So what i can do in python is to use ijson lib and split the json from the endpoint while receiving it and storing the result to my hard drive. This is very memory efficient and gives me the ability to run this async and start transforming the results while data are still loaded.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import ijson
import json
from urllib.request import urlopen

f = urlopen(url)
objects = ijson.items(f, 'item', use_float=True)
record = (o for o in objects)
for i,r in enumerate(record):
  print(i)
  with open(f'/tmp/streamwriter/{i}.json', 'w') as f:
    f.write(json.dumps(r))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;now i want to do this in nifi. Is there a processor that can do the same. The way I understand the InvokeHttp Processor by now is that it has to receive the full payload before it sends the flowfile down stream.&lt;/P&gt;&lt;P&gt;----&lt;/P&gt;&lt;P&gt;Reference:&lt;/P&gt;&lt;P&gt;I asked the same questions on &lt;A href="https://stackoverflow.com/questions/78107817/using-nifi-for-iterative-json-parsing-from-a-given-http-stream" target="_self"&gt;stackoverflow&lt;/A&gt;. But since I did not receive an answer there, i tried this forum.&lt;/P&gt;</description>
    <pubDate>Thu, 11 Apr 2024 07:00:31 GMT</pubDate>
    <dc:creator>jnk32</dc:creator>
    <dc:date>2024-04-11T07:00:31Z</dc:date>
    <item>
      <title>using nifi for iterative json parsing from a given http stream</title>
      <link>https://community.cloudera.com/t5/Support-Questions/using-nifi-for-iterative-json-parsing-from-a-given-http/m-p/386389#M246028</link>
      <description>&lt;P&gt;I have a url endpoint that provides a json file with a size of a couple of `GB`. Unfortunately the api does not support pagination which would be the normal approach to my problem.&lt;/P&gt;&lt;P&gt;So what i can do in python is to use ijson lib and split the json from the endpoint while receiving it and storing the result to my hard drive. This is very memory efficient and gives me the ability to run this async and start transforming the results while data are still loaded.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import ijson
import json
from urllib.request import urlopen

f = urlopen(url)
objects = ijson.items(f, 'item', use_float=True)
record = (o for o in objects)
for i,r in enumerate(record):
  print(i)
  with open(f'/tmp/streamwriter/{i}.json', 'w') as f:
    f.write(json.dumps(r))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;now i want to do this in nifi. Is there a processor that can do the same. The way I understand the InvokeHttp Processor by now is that it has to receive the full payload before it sends the flowfile down stream.&lt;/P&gt;&lt;P&gt;----&lt;/P&gt;&lt;P&gt;Reference:&lt;/P&gt;&lt;P&gt;I asked the same questions on &lt;A href="https://stackoverflow.com/questions/78107817/using-nifi-for-iterative-json-parsing-from-a-given-http-stream" target="_self"&gt;stackoverflow&lt;/A&gt;. But since I did not receive an answer there, i tried this forum.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Apr 2024 07:00:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/using-nifi-for-iterative-json-parsing-from-a-given-http/m-p/386389#M246028</guid>
      <dc:creator>jnk32</dc:creator>
      <dc:date>2024-04-11T07:00:31Z</dc:date>
    </item>
    <item>
      <title>Re: using nifi for iterative json parsing from a given http stream</title>
      <link>https://community.cloudera.com/t5/Support-Questions/using-nifi-for-iterative-json-parsing-from-a-given-http/m-p/386415#M246033</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/109580"&gt;@jnk32&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our NiFi experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/105558"&gt;@joseomjr&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/38301"&gt;@mburgess&lt;/a&gt;&amp;nbsp;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Apr 2024 13:32:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/using-nifi-for-iterative-json-parsing-from-a-given-http/m-p/386415#M246033</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2024-04-11T13:32:57Z</dc:date>
    </item>
  </channel>
</rss>

