using nifi for iterative json parsing from a given http stream

jnk32 — Thu, 11 Apr 2024 07:00:31 GMT

I have a url endpoint that provides a json file with a size of a couple of `GB`. Unfortunately the api does not support pagination which would be the normal approach to my problem.

So what i can do in python is to use ijson lib and split the json from the endpoint while receiving it and storing the result to my hard drive. This is very memory efficient and gives me the ability to run this async and start transforming the results while data are still loaded.

import ijson import json from urllib.request import urlopen f = urlopen(url) objects = ijson.items(f, 'item', use_float=True) record = (o for o in objects) for i,r in enumerate(record): print(i) with open(f'/tmp/streamwriter/{i}.json', 'w') as f: f.write(json.dumps(r))

now i want to do this in nifi. Is there a processor that can do the same. The way I understand the InvokeHttp Processor by now is that it has to receive the full payload before it sends the flowfile down stream.

----

Reference:

I asked the same questions on stackoverflow. But since I did not receive an answer there, i tried this forum.

Re: using nifi for iterative json parsing from a given http stream

DianaTorres — Thu, 11 Apr 2024 13:32:57 GMT

@jnk32 Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @joseomjr @SAMSAL @mburgess who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

question using nifi for iterative json parsing from a given http stream in Support Questions

using nifi for iterative json parsing from a given http stream

Re: using nifi for iterative json parsing from a given http stream