Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi SplitJson - how to access Original flow from Split

avatar
Contributor

In the SplitJson processor, is there any way to pass the Original flow to Split flow as an attribute, or reference the original in expression in the Split flow? In my case, the Json node to split is not on the root, but I need the root attribute over to the Split flow. Thanks.

1 ACCEPTED SOLUTION

avatar
Master Guru

For sufficiently small JSON files, you can use EvaluateJsonPath or ExtractText to get the full body of the document into an attribute before the SplitJson, but keep in mind that this will load the document into memory (rather than being in the content repository and only referenced), and if you modify the flow file, both the original and the new flow file will have a copy in memory. This can get unwieldy pretty quickly. If instead you can determine a smaller portion of the document that is needed, EvaluateJsonPath (with the appropriate JSON Path expression) can store that as an attribute instead. Alternatively you might be able to store the original document with PutDistributedMapCache, and then fetch it into an attribute only when it is needed (so also the use of UpdateAttribute to delete it when finished is recommended).

A different approach, if you are comfortable with a scripting language such as Javascript or Groovy, is to use ExecuteScript to invert the behavior of SplitJson; that is, keep the flow file content identical to the original content, and instead store each split value as an attribute in its own flow file. This maintains the original content in each flow file, and as I mentioned the content itself will not be "moved" or copied; instead the flow file maintains a reference to the content (which would be unchanged from the original in this case). If you'd like to see this "inverse" behavior supported in SplitJson (so you can choose whether the splits go in attributes or content), please feel free to file a Jira for this capability.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

Between two processors, you define a relationship. So you probably already established a relationship from the SplitJSon to another processor, and it is configured for the ‘split’ relationship. You can setup another relationship and configure it as the ‘original’ relationship.

So between two processors, right click on the relationship and you should see three check boxes. One for original, split, and failure.

avatar
Contributor

How do I join the split flow stream with the Original flow then? As I mentioned I need to be able to get the upper level attributes (contained in the original json) of the split node from the Split flow.

avatar
Master Guru

For sufficiently small JSON files, you can use EvaluateJsonPath or ExtractText to get the full body of the document into an attribute before the SplitJson, but keep in mind that this will load the document into memory (rather than being in the content repository and only referenced), and if you modify the flow file, both the original and the new flow file will have a copy in memory. This can get unwieldy pretty quickly. If instead you can determine a smaller portion of the document that is needed, EvaluateJsonPath (with the appropriate JSON Path expression) can store that as an attribute instead. Alternatively you might be able to store the original document with PutDistributedMapCache, and then fetch it into an attribute only when it is needed (so also the use of UpdateAttribute to delete it when finished is recommended).

A different approach, if you are comfortable with a scripting language such as Javascript or Groovy, is to use ExecuteScript to invert the behavior of SplitJson; that is, keep the flow file content identical to the original content, and instead store each split value as an attribute in its own flow file. This maintains the original content in each flow file, and as I mentioned the content itself will not be "moved" or copied; instead the flow file maintains a reference to the content (which would be unchanged from the original in this case). If you'd like to see this "inverse" behavior supported in SplitJson (so you can choose whether the splits go in attributes or content), please feel free to file a Jira for this capability.

avatar
Contributor

End up modifying SplitJson.java to include original content as below: {"RESULT":[{"SPLIT":{ }, "ORIGINAL":{ }]}