Created on 07-13-201603:53 AM - edited 08-17-201911:23 AM
One of the great things about Apache NiFi is its ability to consume and transform many formats of data, however a particular area of complexity has been around the transformation of JSON data received (i.e. JSON-to-JSON transformation). Take the case of the GetTwitter processor that provides access to Twitter data streams (either from a filtered search, sample set, or firehose endpoints). For anyone who has worked with Twitter’s JSON schema it is very rich with detailed information for an individual tweet. There are cases where a lot of this data isn’t required for analytics and data flow managers or analysts are looking to pare it down to the necessities. There are also instances where incoming JSON data simply needs to be formatted or re-labeled for use in another system or repository, such as Hive, HBase, or MongoDB.
Outside of NiFi, JSON-to-JSON transformation has been simplified in the
Jolt Java library which offers a declarative approach to defining JSON output. Jolt provides a set of transformation types, each with their own DSL (called specifications), that define the new structure for outgoing JSON data. Prior to NiFi 0.6.1 @Matt Burgess wrote a great article on how to incorporate the use of Jolt in NiFi via ExecuteScript, which by itself is a cool processor to use when you need to extend the capabilities of NiFi. The Apache Community saw an opportunity to pair the ease of use of Jolt with the power of NiFi by introducing the JoltTransformJSON as a standard processor in the upcoming version 0.7.
JoltTransformJSON will be included as part of the standard set of processors allowing NiFi users to easily add, validate, and test Jolt specifications for JSON data flow content. A simple configuration option is found under the properties tab on the processor, which provides Jolt’s existing options for transformation types and a field to enter the JSON specification for the selected transformation.
For those looking for more options to validate and test specifications the Advanced button provides access to a rich configuration UI that will allow users to do JSON and Jolt validation (against the selected transformation) as well as transformation testing with example input.This UI helps to give users a bit more assurance of the outcome of JSON data before actually applying it to the flow.When using either the simple or advanced flow if invalid specifications are saved then NiFi’s will do it’s usual work of notifying users of any errors associated with the processor's configuration.
Keep a look out for the JoltTransformJSON processor in the next release of NiFi 0.7. Or if you’re looking to get your hands dirty and try it out now you can download and compile NiFi source via the
github mirror. To test out the above data flow you can get a template on GitHub Gist here and import it into NiFi. Also here is a Gist with example specifications to try. Remember that you'll need to configure the GetTwitter processor with your own keys/access tokens first and make sure that PutFile processors are set with a destination. For more insight on using this processor (or working with the example flow) check out the video below:
This processor also has a community driven roadmap for growth with work in progress for custom transformation support and even more flexibility potential extensions for expression languages.
Have any questions about transforming JSON in NiFi with Jolt? Please feel free to comment below or reach out to the community on the
Apache NiFi mailing list.