Member since
11-16-2015
890
Posts
647
Kudos Received
245
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
847 | 02-22-2024 12:38 PM | |
796 | 02-02-2023 07:07 AM | |
2127 | 12-07-2021 09:19 AM | |
3421 | 03-20-2020 12:34 PM | |
11548 | 01-27-2020 07:57 AM |
02-12-2018
04:42 PM
1 Kudo
NIFI-978 addresses this capability, I have a Pull Request up with the improvement, perhaps it will make it into NiFi 1.6.0.
... View more
02-12-2018
04:14 PM
3 Kudos
It is possible with ExecuteScript and Groovy if you don't have a SecurityManager set on the JVM that would prevent Groovy from getting at the ProcessorNode which is a private member variable of the StandardProcessContext. The following script can be used in ExecuteScript to add the parent process group ID as an attribute to an incoming flow file: def flowFile = session.get()
if(!flowFile) return
processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown'
flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId)
session.transfer(flowFile, REL_SUCCESS)
... View more
02-08-2018
05:35 PM
3 Kudos
In the article you mention, the ID of the connection is specified as an attribute, then the InvokeHttp processor uses it to create a URL to poll for status. In your case, you'd just need the list of connection IDs, and you can create a flow file for each connection. You can use the "/process-groups/root/connections" REST endpoint to get a list of connections at the root process group. If you want to get connections for child process groups, you'd first need to get the child process groups' IDs and use those in the connections endpoint to get the connections for each process group. That can recurse down through each child process group and may be unwieldy. For this example I'm going to assume you simply want to monitor your queues in the root process group. With InvokeHttp and the aforementioned REST endpoint to get the connections, you'll get a JSON object back with a field called "connections", which is an array of connections. Then you can use SplitJson with a JSONPath of "$.connections", and it will create a flow file for each of the connections. Then you can use EvaluateJsonPath to extract the connection's URL using a JSONPath of "$.uri". Then you can continue using the flow described in the other article, and for each of the flow files, it will retrieve the status for that connection.
... View more
02-07-2018
08:19 PM
1 Kudo
Avro Schemas can be confusing the first couple of times you create them 🙂 In your case you could use the following: {
"namespace": "nifi",
"name": "cesarPipeDelimitedRecord",
"type": "record",
"fields": [
{"name": "id","type": "string"},
{"name": "sequence","type": "int"},
{"name": "category","type": "int"},
{"name": "text","type": "string"}
]
} If you can have missing values, then you can replace the type with a union, for example if "category" can be missing, then its field entry can be {"name": "category","type": ["null","int"]},
... View more
02-07-2018
05:59 PM
2 Kudos
You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format. If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)
... View more
01-30-2018
12:59 PM
2 Kudos
What version of NiFi are you using? The timezone parameter was added in NiFi 1.2.0 / HDF 3.0 (NIFI-2908).
... View more
01-24-2018
04:57 PM
So right now it appears you are trying to do validation and extraction at the same time, since you don't want "case 2" to move down the stream. If your new ReplaceText from this comment is more performant than the one from the original question, you can use RouteOnContent first to exclude the files that do not have the required header and footer. Since there will now be two pattern matching processors, you may find that it is less performant, but it's probably worth a try. Another option is ExecuteScript with a fast scripting language like Groovy or Javascript/Nashorn, but the overhead of the interpreted script might be worse than the improvement of looking only for headers/footers rather than a whole regex.
... View more
01-23-2018
07:00 PM
1 Kudo
How is your data coming into NiFi? If it is a single flow file with all the rows (such as ExecuteSQL which returns an Avro file with records in it), then you can use SplitAvro and then downstream each flow file can be processed separately, with no looping required. If your input is a text file you can use SplitText, if JSON then SplitJSON, etc. If instead you have a number (say 10), and you need to fetch rows with ids 1-10, you can either use ExecuteSQL and get all rows < 10. If I am misunderstanding your use case and you do need to loop, then after you get your loop variable into an attribute (perhaps with EvaluateJSONPath as you mention), then you can use RouteOnAttribute only to see if it is time to exit the loop (${loopVariable:gt(0)} for example). Otherwise you can use UpdateAttribute to increment or decrement the counter, and send that output back to the beginning of the loop.
... View more
01-23-2018
06:24 PM
You can use either ConvertRecord or ConvertAvroToJSON to convert your incoming Avro data to JSON. If the incoming Avro files do not have a schema embedded in them, then you will have to provide it, either to an AvroReader (for ConvertRecord) or the "Avro schema" property (for ConvertAvroToJSON).
... View more
01-22-2018
06:32 PM
1 Kudo
Perhaps try ReplaceText first, to match your beginning and end text, and replace them with an empty string. Then if you need the content as an attribute, you can use ExtractText with (.*). Do you definitely need the value in an attribute? If you can keep it in the content after the ReplaceText processor.
... View more