I have a url endpoint that provides a json file with a size of a couple of `GB`. Unfortunately the api does not support pagination which would be the normal approach to my problem.
So what i can do in python is to use ijson lib and split the json from the endpoint while receiving it and storing the result to my hard drive. This is very memory efficient and gives me the ability to run this async and start transforming the results while data are still loaded.
import ijson
import json
from urllib.request import urlopen
f = urlopen(url)
objects = ijson.items(f, 'item', use_float=True)
record = (o for o in objects)
for i,r in enumerate(record):
print(i)
with open(f'/tmp/streamwriter/{i}.json', 'w') as f:
f.write(json.dumps(r))
now i want to do this in nifi. Is there a processor that can do the same. The way I understand the InvokeHttp Processor by now is that it has to receive the full payload before it sends the flowfile down stream.
----
Reference:
I asked the same questions on stackoverflow. But since I did not receive an answer there, i tried this forum.