About mburgess

mburgess · ‎12-31-2016

This article describes various "recipes" on how to accomplish certain tasks with the NiFi processor ExecuteScript, with examples given in Groovy, Jython, Javascript (Nashorn), and JRuby. This is Part 2 in the series, I will be discussing reading from and writing to flow file contents, as well as error handling. Part 1 - Introduction to the NiFi API and FlowFiles Getting a flow file from an incoming queue Creating new flow files Working with flow file attributes Transferring a flow file Logging Part 2 - FlowFile I/O and Error Handling Reading from a flow file Writing to a flow file Reading and writing to/from a flow file Error Handling Part 3 - Advanced Features Using Dynamic Properties Adding Modules State Management Accessing Controller Services Introduction to FlowFile I/O Flow files in NiFi are made of two major components, attributes and content. Attributes are metadata about the content / flow file, and we saw how to manipulate them using ExecuteScript in Part 1 of this series. The content of a flow file is, at its heart, simply a collection of bytes and has no inherent structure, schema, format, etc. Various NiFi processors assume the incoming flow files have a particular schema/format (or determine it from attributes such as "mime.type" or infer it in other ways). These processors can then act upon the content based on the assumption that the files really do have that format (and will often transfer to a "failure" relationship if they do not). Also processors may output flow files in a specified format, this is described in the processors' descriptions in the NiFi documentation. Input and Output (I/O) for the contents of flow files is provided via the ProcessSession API and thus the "session" variable for ExecuteScript (see Part 1 for more information). One mechanism for this is to pass a callback object into a call to session.read() or session.write(). An InputStream and/or OutputStream will be created for the FlowFile object, and the callback object will be invoked using the corresponding callback interface, with the InputStream and/or OutputStream references passed in for use by the callback. There are three main callback interfaces, each with its own use case: InputStreamCallback This interface is used by the session.read( flowFile, inputStreamCallback) method to provide an InputStream from which to read the contents of the flow file. The interface has a single method: void process(InputStream in) throws IOException This interface provides a managed input stream for use. The input stream is automatically opened and closed though it is ok to close the stream manually. This is the form you would use if you are only reading from a particular flow file, and not writing back out to it. An example is when you want to process an incoming flow file, but create many output flow files, such as the SplitText processor does. OutputStreamCallback This interface is used by the session.write( flowFile, outputStreamCallback) method to provide an OutputStream to which to write the contents of the flow file. The interface has a single method: void process(OutputStream out) throws IOException This interface provides a managed output stream for use. The output stream is automatically opened and closed though it is ok to close the stream manually - and quite important if any streams wrapping these streams open resources which should be cleared. An example is when ExecuteScript will be generating data, either from within or from an external file, but not a flow file. Then you would use session.create() to create a new FlowFile, then session.write(flowFile, outputStreamCallback) to insert content. StreamCallback This interface is used by the session.write(flowFile, streamCallback) method to provide an InputStream and OutputStream, from which to read from and/or write to the contents of the flow file. The interface has a single method: void process(InputStream in, OutputStream out) throws IOException This interface provides managed input and output streams for use. The input stream is automatically opened and closed though it is ok to close the streams manually - and quite important if any streams wrapping these streams open resources which should be cleared. An example is when you want to process an incoming flow file and overwrite its contents with something new, such as the EncryptContent processor does. Since these callbacks are Java objects, the script will have to create one and pass it into the session method(s), the recipes will illustrate this for the various scripting languages. Also there are other methods of reading from and writing to flow files, which include: Using session.read(flowFile) to return an InputStream. This alleviates the need for an InputStreamCallback, instead it returns an InputStream that you can read from. In exchange you must manage (close, e.g.) the InputStream manually. Using session.importFrom(inputStream, flowFile) to write from an InputStream to a FlowFile. This replaces the need for a session.write() with an OutputStreamCallback passed in. Now, on to the recipes 🙂 Recipes Recipe: Read the contents of an incoming flow file using a callback Use Case: You have incoming connection(s) to ExecuteScript and want to retrieve the contents of a flow file from the queue(s) for processing. Approach: Use the read(flowFile, inputStreamCallback) method from the session object. An InputStreamCallback object is needed to pass into the read() method. Note that because InputStreamCallback is an object, the contents are only visible to that object by default. If you need to use the data outside the read() method, use a more globally-scoped variable. The examples will store the full contents of the incoming flow file into a String (using Apache Commons' IOUtils class). NOTE: For large flow files, this is not the best technique; rather you should read in only as much data as you need, and process that as appropriate. For something like SplitText, you could read in a line at a time and process it within the InputStreamCallback, or use the session.read(flowFile) approach mentioned earlier to get an InputStream reference to use outside of a callback. Examples: Groovy import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets flowFile = session.get() if(!flowFile)return def text = '' // Cast a closure with an inputStream parameter to InputStreamCallback session.read(flowFile, {inputStream -> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) // Do something with text here } as InputStreamCallback) Jython from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import InputStreamCallback # Define a subclass of InputStreamCallback for use in session.read() class PyInputStreamCallback(InputStreamCallback): def __init__(self): pass def process(self, inputStream): text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # Do something with text here # end class flowFile = session.get() if(flowFile != None): session.read(flowFile, PyInputStreamCallback()) # implicit return at the end Javascript var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback") var IOUtils = Java.type("org.apache.commons.io.IOUtils") var StandardCharsets = Java.type("java.nio.charset.StandardCharsets") var flowFile = session.get(); if(flowFile != null) { // Create a new InputStreamCallback, passing in a function to define the interface method session.read(flowFile, new InputStreamCallback(function(inputStream) { var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8); // Do something with text here })); } JRuby java_import org.apache.commons.io.IOUtils java_import org.apache.nifi.processor.io.InputStreamCallback # Define a subclass of InputStreamCallback for use in session.read() class JRubyInputStreamCallback include InputStreamCallback def process(inputStream) text = IOUtils.toString(inputStream) # Do something with text here end end jrubyInputStreamCallback = JRubyInputStreamCallback.new flowFile = session.get() if flowFile != nil session.read(flowFile, jrubyInputStreamCallback) end Recipe: Write content to an outgoing flow file using a callback Use Case: You want to generate content for an outgoing flow file. Approach: Use the write(flowFile, outputStreamCallback) method from the session object. An OutputStreamCallback object is needed to pass into the write() method. Note that because OutputStreamCallback is an object, the contents are only visible to that object by default. If you need to use the data outside the write() method, use a more globally-scoped variable. The examples will write a sample String to a flowFile. Examples: Groovy import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets flowFile = session.get() if(!flowFile) return def text = 'Hello world!' // Cast a closure with an outputStream parameter to OutputStreamCallback flowFile = session.write(flowFile, {outputStream -> outputStream.write(text.getBytes(StandardCharsets.UTF_8)) } as OutputStreamCallback) Jython from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import OutputStreamCallback # Define a subclass of OutputStreamCallback for use in session.write() class PyOutputStreamCallback(OutputStreamCallback): def __init__(self): pass def process(self, outputStream): outputStream.write(bytearray('Hello World!'.encode('utf-8'))) # end class flowFile = session.get() if(flowFile != None): flowFile = session.write(flowFile, PyOutputStreamCallback()) # implicit return at the end Javascript var OutputStreamCallback = Java.type("org.apache.nifi.processor.io.OutputStreamCallback"); var IOUtils = Java.type("org.apache.commons.io.IOUtils"); var StandardCharsets = Java.type("java.nio.charset.StandardCharsets"); var flowFile = session.get(); if(flowFile != null) { // Create a new OutputStreamCallback, passing in a function to define the interface method flowFile = session.write(flowFile, new OutputStreamCallback(function(outputStream) { outputStream.write("Hello World!".getBytes(StandardCharsets.UTF_8)) })); } JRuby java_import org.apache.commons.io.IOUtils java_import java.nio.charset.StandardCharsets java_import org.apache.nifi.processor.io.OutputStreamCallback # Define a subclass of OutputStreamCallback for use in session.write() class JRubyOutputStreamCallback include OutputStreamCallback def process(outputStream) outputStream.write("Hello World!".to_java.getBytes(StandardCharsets::UTF_8)) end end jrubyOutputStreamCallback = JRubyOutputStreamCallback.new flowFile = session.get() if flowFile != nil flowFile = session.write(flowFile, jrubyOutputStreamCallback) end Recipe: Overwrite an incoming flow file with updated content using a callback Use Case: You want to reuse the incoming flow file but want to modify its content for the outgoing flow file. Approach: Use the write(flowFile, streamCallback) method from the session object. An StreamCallback object is needed to pass into the write() method. StreamCallback provides both an InputStream (from the incoming flow file) and an outputStream (for the next version of that flow file), so you can use the InputStream to get the current contents of the flow file, then modify them and write them back out to the flow file. This overwrites the contents of the flow file, so for append you'd have to handle that by appending to the read-in contents, or use a different approach (with session.append() rather than session.write() ). Note that because StreamCallback is an object, the contents are only visible to that object by default. If you need to use the data outside the write() method, use a more globally-scoped variable. The examples will reverse the contents of the incoming flowFile (assumed to be a String) and write out the reversed string to a new version of the flowFile. Examples: Groovy import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets flowFile = session.get() if(!flowFile) return def text = 'Hello world!' // Cast a closure with an inputStream and outputStream parameter to StreamCallback flowFile = session.write(flowFile, {inputStream, outputStream -> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) outputStream.write(text.reverse().getBytes(StandardCharsets.UTF_8)) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Jython from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback # Define a subclass of StreamCallback for use in session.write() class PyStreamCallback(StreamCallback): def __init__(self): pass def process(self, inputStream, outputStream): text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) outputStream.write(bytearray('Hello World!'[::-1].encode('utf-8'))) # end class flowFile = session.get() if(flowFile != None): flowFile = session.write(flowFile, PyStreamCallback()) # implicit return at the end Javascript var StreamCallback = Java.type("org.apache.nifi.processor.io.StreamCallback"); var IOUtils = Java.type("org.apache.commons.io.IOUtils"); var StandardCharsets = Java.type("java.nio.charset.StandardCharsets"); var flowFile = session.get(); if(flowFile != null) { // Create a new StreamCallback, passing in a function to define the interface method flowFile = session.write(flowFile, new StreamCallback(function(inputStream, outputStream) { var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) outputStream.write(text.split("").reverse().join("").getBytes(StandardCharsets.UTF_8)) })); } JRuby java_import org.apache.commons.io.IOUtils java_import java.nio.charset.StandardCharsets java_import org.apache.nifi.processor.io.StreamCallback # Define a subclass of StreamCallback for use in session.write() class JRubyStreamCallback include StreamCallback def process(inputStream, outputStream) text = IOUtils.toString(inputStream) outputStream.write((text.reverse!).to_java.getBytes(StandardCharsets::UTF_8)) end end jrubyStreamCallback = JRubyStreamCallback.new flowFile = session.get() if flowFile != nil flowFile = session.write(flowFile, jrubyStreamCallback) end Recipe: Handle errors during script processing Use Case: An error occurs in the script (either by data validation or a thrown exception), and you want the script to handle it gracefully. Approach: For exceptions, use the exception-handling mechanism for the scripting language (often they are try/catch block(s)). For data validation, you can use a similar approach, but define a boolean variable like "valid" and an if/else clause rather than a try/catch clause. ExecuteScript defines "success" and "failure" relationships; often your processing will transfer "good" flow files to success and "bad" flow files to failure (logging an error in the latter case). It is possible Examples: Groovy flowFile = session.get() if(!flowFile) return try { // Something that might throw an exception here // Last operation is transfer to success (failures handled in the catch block) session.transfer(flowFile, REL_SUCCESS) } catch(e) { log.error('Something went wrong', e) session.transfer(flowFile, REL_FAILURE) } Jython flowFile = session.get() if(flowFile != None): try: # Something that might throw an exception here # Last operation is transfer to success (failures handled in the catch block) session.transfer(flowFile, REL_SUCCESS) except: log.error('Something went wrong', e) session.transfer(flowFile, REL_FAILURE) # implicit return at the end Javascript var flowFile = session.get(); if(flowFile != null) { try { // Something that might throw an exception here // Last operation is transfer to success (failures handled in the catch block) session.transfer(flowFile, REL_SUCCESS) } catch(e) { log.error('Something went wrong', e) session.transfer(flowFile, REL_FAILURE) } } JRuby flowFile = session.get() if flowFile != nil begin # Something that might raise an exception here # Last operation is transfer to success (failures handled in the rescue block) session.transfer(flowFile, REL_SUCCESS) rescue Exception => e log.error('Something went wrong', e) session.transfer(flowFile, REL_FAILURE) end end Hopefully this article has described the basics of FlowFile I/O and error handling, but suggestions and improvements are always welcome! In the next article in the series, I will discuss some more advanced features such as dynamic properties, modules, state management, and accessing/using Controller Services. Until then, cheers!

mburgess · ‎12-31-2016

If you know the full set of fields in the JSON objects (and there's not a prohibitively large number of them), you could use EvaluateJsonPath to extract each field and its value into attributes, then UpdateAttribute to update ingestionDate with the current time (in NiFi Expression Language, you can use ${now()} ), then AttributesToJSON or ReplaceText to output an updated JSON object. If the JSON object is sufficiently large or complex, you might consider using ExecuteScript with a language that handles JSON objects easily, such as Groovy, Jython, or Javascript. Here is an example of a Groovy script to update the ingestionDate field with today's date (and update the "filename" attribute by appending _translated.json to it): import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets import groovy.json.* def flowFile = session.get() if (!flowFile) return try { flowFile = session.write(flowFile, { inputStream, outputStream -> def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) def obj = new JsonSlurper().parseText(text) // Update ingestionDate field with today's date obj.ingestionDate = new Date().format( 'dd-MM-yyyy' ) // Output updated JSON def json = JsonOutput.toJson(obj) outputStream.write(JsonOutput.prettyPrint(json).getBytes(StandardCharsets.UTF_8)) } as StreamCallback) flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').tokenize('.')[0]+'_translated.json') session.transfer(flowFile, REL_SUCCESS) } catch(Exception e) { log.error('Error during JSON operations', e) session.transfer(flowFile, REL_FAILURE) }

mburgess · ‎12-30-2016

In addition to @Pierre Villard's answer (which nicely gets the job done with ExecuteScript, I have a similar example here), since you are looking to do row-level operations (i.e. select columns from each row), you could use SplitText to split the large file into individual lines, then your ReplaceText above, then MergeContent to put the whole thing back together. I'm not sure which approach is faster per se; it would be an interesting exercise to try both.

mburgess · ‎12-30-2016

There are some examples of InvokedScriptedProcessor using Jython in the unit tests here. The examples use set() (not Set(), I don't know if there's a difference there between the return types or not)

mburgess · ‎12-28-2016

If the query returns too many rows in QueryDatabaseTable, can you set Max Rows Per Flow File and Fetch Size to something like 10000? That should prevent the entire result set from being loaded into memory (if the JDBC driver supports it). Also if you want to do something like 200000 >= id, you can set the Initial Max Value for that field using a dynamic property initial.maxvalue.id to 200000.

mburgess · ‎12-28-2016

This article describes various "recipes" on how to accomplish certain tasks with the NiFi processor ExecuteScript, with examples given in Groovy, Jython, Javascript (Nashorn), and JRuby. Recipes in this article series include: Part 1 - Introduction to the NiFi API and FlowFiles Getting a flow file from an incoming queue Creating new flow files Working with flow file attributes Transferring a flow file Logging Part 2 - FlowFile I/O and Error Handling Reading from a flow file Writing to a flow file Reading and writing to/from a flow file Error Handling Part 3 - Advanced Features Using Dynamic Properties Adding Modules State Management Accessing Controller Services Introduction ExecuteScript is a versatile processor that allows the user to code custom logic in a programming language that will be executed each time the ExecuteScript processor is triggered. The following variable bindings are provided to the script to enable access to NiFi components: session: This is a reference to the ProcessSession assigned to the processor. The session allows you to perform operations on flow files such as create(), putAttribute(), and transfer(), as well as read() and write(). context: This is a reference to the ProcessContext for the processor. It can be used to retrieve processor properties, relationships, Controller Services, and the StateManager. log: This is a reference to the ComponentLog for the processor. Use it to log messages to NiFi, such as log.info('Hello world!') REL_SUCCESS: This is a reference to the "success" relationship defined for the processor. It could also be inherited by referencing the static member of the parent class (ExecuteScript), but some engines such as Lua do not allow for referencing static members, so this is a convenience variable. It also saves having to use the fully-qualified name for the relationship. REL_FAILURE: This is a reference to the "failure" relationship defined for the processor. As with REL_SUCCESS, it could also be inherited by referencing the static member of the parent class (ExecuteScript), but some engines such as Lua do not allow for referencing static members, so this is a convenience variable. It also saves having to use the fully-qualified name for the relationship. Dynamic Properties: Any dynamic properties defined in ExecuteScript are passed to the script engine as variables set to the PropertyValue object corresponding to the dynamic property. This allows you to get the String value of the property, but also to evaluate the property with respect to NiFi Expression Language, cast the value as an appropriate data type (such as Boolean, e.g.), etc. Because the dynamic property name becomes the variable name for the script, you must be aware of the variable naming properties for the chosen script engine. For example, Groovy does not allow periods (.) in variable names, so an error will occur if "my.property" was a dynamic property name. Interaction with these variables is done via the NiFi Java API, each recipe below will discuss the relevant API calls as they are introduced. The recipes in the following section perform various functions on flow files, such as reading/writing attributes, transferring to a relationship, logging, etc. Please note that the examples are snippets and do not run as-is. For example, if a flow file has been retrieved from the queue with session.get(), it must be transferred to a relationship or removed, or else an error will occur. The snippets are meant to be plain and clear to illustrate only the concept(s) presented, without the addition of boilerplate code to make them working examples. In a later article I will put them all together to show full working scripts that perform useful tasks. Recipes Recipe: Get an incoming flow file from the session Use Case: You have incoming connection(s) to ExecuteScript and want to retrieve one flow file from the queue(s) for processing. Approach: Use the get() method from the session object. This method returns the FlowFile that is next highest priority FlowFile to process. If there is no FlowFile to process, the method will return null. Note that it is possible to have null returned even if there is a steady flow of FlowFiles into the processor. This can happen if there are multiple concurrent tasks for the processor, and the other task(s) have already retrieved the FlowFiles. If the script requires a FlowFile to continue processing, then it should immediately return if null is returned from session.get() Examples: Groovy flowFile = session.get() if(!flowFile) return Jython flowFile = session.get() if (flowFile != None): # All processing code starts at this indent # implicit return at the end Javascript var flowFile = session.get(); if (flowFile != null) { // All processing code goes here } JRuby flowFile = session.get() if flowFile != nil # All processing code goes here end Recipe: Get multiple incoming flow files from the session Use Case: You have incoming connection(s) to ExecuteScript and want to retrieve multiple flow files from the queue(s) for processing. Approach: Use the get(maxResults) method from the session object. This method returns up to maxResults FlowFiles from the work queue. If no FlowFiles are available, an empty list is returned (the method does not return null). NOTE: If multiple incoming queues are present, the behavior is unspecified in terms of whether all queues or only a single queue will be polled in a single call. Having said that, the observed behavior (for both NiFi 1.1.0+ and before) is described here. Examples: Groovy flowFileList = session.get(100) if(!flowFileList.isEmpty()) { flowFileList.each { flowFile -> // Process each FlowFile here } } Jython flowFileList = session.get(100) if not flowFileList.isEmpty(): for flowFile in flowFileList: # Process each FlowFile here Javascript flowFileList = session.get(100) if(!flowFileList.isEmpty()) { for each (var flowFile in flowFileList) { // Process each FlowFile here } } JRuby flowFileList = session.get(100) if !(flowFileList.isEmpty()) flowFileList.each { |flowFile| # Process each FlowFile here } end Recipe: Create a new FlowFile Use Case: You want to generate a new FlowFile to send to the next processor Approach: Use the create() method from the session object. This method returns a new FlowFile object, which you can perform further processing on Examples: Groovy flowFile = session.create() // Additional processing here Jython flowFile = session.create() # Additional processing here Javascript var flowFile = session.create(); // Additional processing here JRuby flowFile = session.create() # Additional processing here Recipe: Create a new FlowFile from a parent FlowFile Use Case: You want to generate new FlowFile(s) based on an incoming FlowFile Approach: Use the create(parentFlowFile) method from the session object. This method takes a parent FlowFile reference and returns a new child FlowFile object. The newly created FlowFile will inherit all of the parent's attributes except for the UUID. This method will automatically generate a Provenance FORK event or a Provenance JOIN event, depending on whether or not other FlowFiles are generated from the same parent before the ProcessSession is committed. Examples: Groovy flowFile = session.get() if(!flowFile) return newFlowFile = session.create(flowFile) // Additional processing here Jython flowFile = session.get() if (flowFile != None): newFlowFile = session.create(flowFile) # Additional processing here Javascript var flowFile = session.get(); if (flowFile != null) { var newFlowFile = session.create(flowFile); // Additional processing here } JRuby flowFile = session.get() if flowFile != nil newFlowFile = session.create(flowFile) # Additional processing here end Recipe: Add an attribute to a flow file Use Case: You have a flow file to which you'd like to add a custom attribute. Approach: Use the putAttribute(flowFile, attributeKey, attributeValue) method from the session object. This method updates the given FlowFile's attributes with the given key/value pair. NOTE: The "uuid" attribute is fixed for a FlowFile and cannot be modified; if the key is named "uuid", it will be ignored. Also this is a good point to mention that FlowFile objects are immutable; this means that if you update a FlowFile's attributes (or otherwise alter it) via the API, you will get a new reference to the new version of the FlowFile. This is very important when it comes to transferring FlowFiles to relationships. You must keep a reference to the latest version of a FlowFile, and you must transfer or remove the latest version of all FlowFiles retrieved from or created by the session, otherwise you will get an error when executing. Most often, the variable used to store a FlowFile reference will be overwritten with the latest version returned from a method that alters the FlowFile (intermediate FlowFile references will be automatically discarded). In these examples you will see this technique of reusing a flowFile reference when adding attributes. Note that the current reference to the FlowFile is passed into the putAttribute() method. The resulting FlowFile has an attribute named 'myAttr' with a value of 'myValue'. Also note that the method takes a String for the value; if you have an Object you will have to serialize it to a String. Finally, please note that if you are adding multiple attributes, it is better to create a Map and use putAllAttributes() instead (see next recipe for details). Examples: Groovy flowFile = session.get() if(!flowFile) return flowFile = session.putAttribute(flowFile, 'myAttr', 'myValue') Jython flowFile = session.get() if (flowFile != None): flowFile = session.putAttribute(flowFile, 'myAttr', 'myValue') # implicit return at the end Javascript var flowFile = session.get(); if (flowFile != null) { flowFile = session.putAttribute(flowFile, 'myAttr', 'myValue') } JRuby flowFile = session.get() if flowFile != nil flowFile = session.putAttribute(flowFile, 'myAttr', 'myValue') end Recipe: Add multiple attributes to a flow file Use Case: You have a flow file to which you'd like to add custom attributes. Approach: Use the putAllAttributes(flowFile, attributeMap) method from the session object. This method updates the given FlowFile's attributes with the key/value pairs from the given Map. NOTE: The "uuid" attribute is fixed for a FlowFile and cannot be modified; if the key is named "uuid", it will be ignored. The technique here is to create a Map (aka dictionary in Jython, hash in JRuby) of the attribute key/value pairs you'd like to update, then call putAllAttributes() on it. This is much more efficient than calling putAttribute() for each key/value pair, as the latter case will cause the framework to create a temporary version of the FlowFile for each attribute added (see above recipe for discussion on FlowFile immutability). The examples show a map of two entries myAttr1 and myAttr2, set to '1' and the language-specific coercion of the number 2 as a String (to adhere to the method signature of requiring String values for both key and value). Note that a session.transfer() is not specified here (so the code snippets below do not work as-is), see the following recipe for that. Examples: Groovy attrMap = ['myAttr1': '1', 'myAttr2': Integer.toString(2)] flowFile = session.get() if(!flowFile) return flowFile = session.putAllAttributes(flowFile, attrMap) Jython attrMap = {'myAttr1':'1', 'myAttr2':str(2)} flowFile = session.get() if (flowFile != None): flowFile = session.putAllAttributes(flowFile, attrMap) # implicit return at the end Javascript var number2 = 2; var attrMap = {'myAttr1':'1', 'myAttr2': number2.toString()} var flowFile = session.get() if (flowFile != null) { flowFile = session.putAllAttributes(flowFile, attrMap) } JRuby attrMap = {'myAttr1' => '1', 'myAttr2' => 2.to_s} flowFile = session.get() if flowFile != nil flowFile = session.putAllAttributes(flowFile, attrMap) end Recipe: Get an attribute from a flow file Use Case: You have a flow file from which you'd like to inspect an attribute. Approach: Use the getAttribute(attributeKey) method from the FlowFile object. This method returns the String value for the given attributeKey, or null if the attributeKey is not found. The examples show the retrieval of the value for the "filename" attribute. Examples: Groovy flowFile = session.get() if(!flowFile) return myAttr = flowFile.getAttribute('filename') Jython flowFile = session.get() if (flowFile != None): myAttr = flowFile.getAttribute('filename') # implicit return at the end Javascript var flowFile = session.get() if (flowFile != null) { var myAttr = flowFile.getAttribute('filename') } JRuby flowFile = session.get() if flowFile != nil myAttr = flowFile.getAttribute('filename') end Recipe: Get all attributes from a flow file Use Case: You have a flow file from which you'd like to retrieve its attributes. Approach: Use the getAttributes() method from the FlowFile object. This method returns a Map with String keys and String values, representing the key/value pairs of attributes for the flow file. The examples show an iteration over the Map of all attributes for a FlowFile. Examples: Groovy flowFile = session.get() if(!flowFile) return flowFile.getAttributes().each { key,value -> // Do something with the key/value pair } Jython flowFile = session.get() if (flowFile != None): for key,value in flowFile.getAttributes().iteritems(): # Do something with key and/or value # implicit return at the end Javascript var flowFile = session.get() if (flowFile != null) { var attrs = flowFile.getAttributes(); for each (var attrKey in attrs.keySet()) { // Do something with attrKey (the key) and/or attrs[attrKey] (the value) } } JRuby flowFile = session.get() if flowFile != nil flowFile.getAttributes().each { |key,value| # Do something with key and/or value } end Recipe: Transfer a flow file to a relationship Use Case: After processing a flow file (new or incoming), you want to transfer the flow file to a relationship ("success" or "failure"). In this simple case let us assume there is a variable called "errorOccurred" that indicates which relationship to which the FlowFile should be transferred. Additional error handling techniques will be discussed in part 2 of this series. Approach: Use the transfer(flowFile, relationship) method from the session object. From the documentation: this method transfers the given FlowFile to the appropriate destination processor work queue(s) based on the given relationship. If the relationship leads to more than one destination the state of the FlowFile is replicated such that each destination receives an exact copy of the FlowFile though each will have its own unique identity. NOTE: ExecuteScript will perform a session.commit() at the end of each execution to ensure the operations have been committed. You do not need to (and should not) perform a session.commit() within the script. Examples: Groovy flowFile = session.get() if(!flowFile) return // Processing occurs here if(errorOccurred) { session.transfer(flowFile, REL_FAILURE) } else { session.transfer(flowFile, REL_SUCCESS) } Jython flowFile = session.get() if (flowFile != None): # All processing code starts at this indent if errorOccurred: session.transfer(flowFile, REL_FAILURE) else: session.transfer(flowFile, REL_SUCCESS) # implicit return at the end Javascript var flowFile = session.get(); if (flowFile != null) { // All processing code goes here if(errorOccurred) { session.transfer(flowFile, REL_FAILURE) } else { session.transfer(flowFile, REL_SUCCESS) } } JRuby flowFile = session.get() if flowFile != nil # All processing code goes here if errorOccurred session.transfer(flowFile, REL_FAILURE) else session.transfer(flowFile, REL_SUCCESS) end end Recipe: Send a message to the log at a specified logging level Use Case: You want to report some event that has occurred during processing to the logging framework. Approach: Use the log variable with the warn(), trace(), debug(), info(), or error() methods. These methods can take a single String, or a String followed by an array of Objects, or a String followed by an array of Objects followed by a Throwable. The first one is used for simple messages. The second is used when you have some dynamic objects/values that you want to log. To refer to these in the message string use "{}" in the message. These are evaluated against the Object array in order of appearance, so if the message reads "Found these things: {} {} {}" and the Object array is ['Hello',1,true], then the logged message will be "Found these things: Hello 1 true". The third form of these logging methods also takes a Throwable parameter, and is useful when an exception is caught and you want to log it. Examples: Groovy log.info('Found these things: {} {} {}', ['Hello',1,true] as Object[]) Jython from java.lang import Object from jarray import array objArray = ['Hello',1,True] javaArray = array(objArray, Object) log.info('Found these things: {} {} {}', javaArray) Javascript var ObjectArrayType = Java.type("java.lang.Object[]"); var objArray = new ObjectArrayType(3); objArray[0] = 'Hello'; objArray[1] = 1; objArray[2] = true; log.info('Found these things: {} {} {}', objArray) JRuby log.info('Found these things: {} {} {}', ['Hello',1,true].to_java) Hopefully these snippets are helpful to illustrate bits of the NiFi API in the context of various scripting languages and flow file operations. I'll put some of these recipes together in a future article, to show some examples of end-to-end scripts. For more examples, use cases, and explanations, please check out my blog. In the next article in this series, I'll talk about reading from and writing to the contents of flow files, as well as discuss error handling techniques. Cheers!

mburgess · ‎12-15-2016

Currently the remove() functionality is used by particular processors such as DetectDuplicate, GetHBase, etc. and is not exposed via a RemoveDistributedMapCache processor or anything like that. I have written an article that provides and describes a Groovy script I wrote to interact with the DistributedMapCacheServer from the command line, does that suit your needs?

mburgess · ‎12-15-2016

In NiFi/HDF, it is possible to create a kind of lookup table of key/value pairs using a DistributedMapCacheServer. The DistributedMapCacheServer is used by various processors such as GetHBase and DetectDuplicate. It can also be leveraged by users in their flows, with the PutDistributedMapCache and FetchDistributedMapCache processors, by specifying a corresponding DistributedMapCacheClientService. Sometimes, however, it might be the case that the user would like to interact with the DistributedMapCacheServer programmatically (and external to NiFi), say for removing specific entries, inserting/populating entries, etc. To that end I have written a Groovy script (dcachegroovy.txt, rename to dcache.groovy) to allow manipulation of the entries in a DistributedMapCacheServer from the command-line, e.g. The usage is as follows: Usage: groovy dcache.groovy <hostname> <port> <command> <args> Where <command> is one of the following: get: Retrieves the values for the keys (provided as arguments) remove: Removes the keys (specified as arguments) put: Sets the given keys to the given values, specified as arguments in the form: key1 value1 key2 value2 ... keyN valueN So to insert entries "a = Hello" and "b = World" (assuming a local DistributedMapCacheServer at the default port), you can enter: groovy dcache.groovy localhost 4557 put a Hello b World Which outputs the following: Set a = Hello Set b = World Then to retrieve the values: groovy dcache.groovy localhost 4557 get a b Which gives: a = Hello b = World To remove an entry: groovy dcache.groovy localhost 4557 remove b Giving: Removed b Trying the get again for both values (where b no longer exists): groovy dcache.groovy localhost 4557 get a b Gives: a = Hello b = This script can be used to pre-populate, clear, or inspect a DistributedMapCacheServer. I'd be interested to hear if you try it, whether you find it useful or not, and of course all suggestions for improvements are welcome. Cheers!

mburgess · ‎12-14-2016

The approach in the other thread is very inefficient for this use case. You're basically trying to do a join between rows in a file and rows in a DB table. An alternative is to populate a DistributedMapCacheServer from the DB table, then look up those values in a separate flow. To populate the map, you could do something like this: Here I am using QueryDatabaseTable with a Max Value Column of "id" such that the map will only be populated once. But if you are adding entries to the lookup table (as it appears you might be from your description) or if new entries will not have strictly greater values for "id", then you can remove the Max Value Column property and schedule the QueryDatabaseTable processor to run as often as you'd like to refresh the values. Once this flow is running, you can start a different flow that is similar to the one in the other thread, but instead of querying the DB for each row in the file, it will fetch from the DistributedCacheMapServer, which is hopefully faster: You can see the first part is the same as the flow in the other thread, but instead of using ReplaceText to generate SQL to execute, the value is simply looked up from the Map and put into an attribute, then the final ReplaceText is like the one in the other thread, specifying "${column.1},${column.2},${column.3},${column.4}, ${customer.name}" or whatever the appropriate attributes are. I have attached a template (databaselookupexample.xml) showing these two flows.

mburgess · ‎12-14-2016

Yes you can add a dynamic property whose value is a regular expression (see the documentation for more details).

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

ExecuteScript Cookbook (part 2)

Re: Processor for replacing JSON values dynamicall...

Re: FlowFiles getting queued before NIFI ReplaceTe...

Re: InvokeScriptedProcessor in Python

Re: Apche NIFI : problem by using QueryDatabaseTab...

ExecuteScript Cookbook (part 1)

Re: How to delete from NiFi DistributedMapCache

Working with a NiFi DistributedMapCache

Re: ExecuteSQL query is very slow

Re: ExecuteSQL dynamic query..