Member since
11-16-2015
905
Posts
665
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 394 | 09-30-2025 05:23 AM | |
| 722 | 06-26-2025 01:21 PM | |
| 608 | 06-19-2025 02:48 PM | |
| 827 | 05-30-2025 01:53 PM | |
| 11290 | 02-22-2024 12:38 PM |
08-29-2017
02:27 PM
You may be better served with ExecuteStreamCommand rather than ExecuteProcess for this case. You could schedule a GenerateFlowFile at the same rate your ExecuteProcess was scheduled for, and set Ignore STDIN to true in ExecuteStreamCommand. Then the outgoing flow files will have the execution.status attribute set, which you can use with RouteOnAttribute to handle failures (non-zero exit codes, e.g.) If you must use ExecuteProcess, perhaps you could run your shell command followed by a double-bar and a command that prints something you can check for later, such as: myCommand || echo "!ERROR!" I haven't tried this so I don't know if that would work, but if it does it would allow you to use RouteOnContent to check for that error string to indicate failure. The same technique works without the || if you know what to look for in your failed command output.
... View more
08-24-2017
06:25 PM
In QueryDatabaseTable, you'd set the Maximum-Value Column to "id" and add a dynamic property named "initial.maxvalue.id" to 50. Make sure state has been cleared before running, and the first time it executes, it will grab all rows with id > 50. This same capability for GenerateTableFetch is not yet available (NIFI-4283) but coming soon.
... View more
08-24-2017
02:21 PM
Since you're using a script, you could feasibly replace steps 4-7 above, but my blog post (plus adding your attribute) really only covers 5&6
... View more
08-22-2017
03:10 PM
Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script
... View more
08-22-2017
01:23 PM
1 Kudo
Try * as the value for the Query property.
... View more
08-15-2017
05:49 PM
It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.
... View more
08-14-2017
06:12 PM
2 Kudos
Are there any failures in the PutHDFS processor? Seems to me (unless the flowfiles have the same filename and Conflict Resolution Strategy is "append") that you should have 49 small flow files in HDFS (not that that's ideal). You won't be able to use MergeContent with ORC files as there is no strategy for that (same goes for MergeRecord until an OrcRecordSetWriter is implemented). If your flow files are Avro (going into ConvertAvroToORC), you could try MergeContent before ConvertAvroToORC and use the Avro merge strategy.
... View more
08-11-2017
03:48 PM
1 Kudo
I think the issue is with the HWX Content-Encoded Schema Reference, this is a special "header" in an avro file which makes it easy to integrate with HWX Schema Registry serializers and deserializers, but likely precludes it from being understood by Apache Avro readers such as the one in ConvertAvroToORC or avro-tools. If you can, try setting the Schema Write Strategy to Embed Avro Schema; this will result in larger flow files but should work in downstream processors. If/when there is a OrcRecordSetWriter, you should be able to reuse the HWX schema reference option there.
... View more
08-11-2017
03:15 PM
1 Kudo
Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).
... View more
07-28-2017
02:29 AM
It's hard to tell from your screenshot what is going on. What kind of file(s) are being read into NiFi, and what is the content of the flow file(s) going to PutElasticsearch5? PutES5 expects a single JSON document as the content of a flow file, and depending on your processor configuration, will perform your specified operation on each document. If your flow file contains multiple documents, you may need SplitJson to get each into its own flow file. Alternatively if you are using NiFi 1.3.0 / HDF 3.0 (and don't mind using Elasticsearch's HTTP API vs the native one), you can use PutElasticsearchHttpRecord, which will allow you to handle flow files that contain multiple records of any format (provided you configure a Record Reader that can parse your input).
... View more