About mburgess

mburgess · ‎08-29-2017

You may be better served with ExecuteStreamCommand rather than ExecuteProcess for this case. You could schedule a GenerateFlowFile at the same rate your ExecuteProcess was scheduled for, and set Ignore STDIN to true in ExecuteStreamCommand. Then the outgoing flow files will have the execution.status attribute set, which you can use with RouteOnAttribute to handle failures (non-zero exit codes, e.g.) If you must use ExecuteProcess, perhaps you could run your shell command followed by a double-bar and a command that prints something you can check for later, such as: myCommand || echo "!ERROR!" I haven't tried this so I don't know if that would work, but if it does it would allow you to use RouteOnContent to check for that error string to indicate failure. The same technique works without the || if you know what to look for in your failed command output.

mburgess · ‎08-24-2017

In QueryDatabaseTable, you'd set the Maximum-Value Column to "id" and add a dynamic property named "initial.maxvalue.id" to 50. Make sure state has been cleared before running, and the first time it executes, it will grab all rows with id > 50. This same capability for GenerateTableFetch is not yet available (NIFI-4283) but coming soon.

mburgess · ‎08-24-2017

Since you're using a script, you could feasibly replace steps 4-7 above, but my blog post (plus adding your attribute) really only covers 5&6

mburgess · ‎08-22-2017

Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script

mburgess · ‎08-22-2017

Try * as the value for the Query property.

mburgess · ‎08-15-2017

It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.

mburgess · ‎08-14-2017

Are there any failures in the PutHDFS processor? Seems to me (unless the flowfiles have the same filename and Conflict Resolution Strategy is "append") that you should have 49 small flow files in HDFS (not that that's ideal). You won't be able to use MergeContent with ORC files as there is no strategy for that (same goes for MergeRecord until an OrcRecordSetWriter is implemented). If your flow files are Avro (going into ConvertAvroToORC), you could try MergeContent before ConvertAvroToORC and use the Avro merge strategy.

mburgess · ‎08-11-2017

I think the issue is with the HWX Content-Encoded Schema Reference, this is a special "header" in an avro file which makes it easy to integrate with HWX Schema Registry serializers and deserializers, but likely precludes it from being understood by Apache Avro readers such as the one in ConvertAvroToORC or avro-tools. If you can, try setting the Schema Write Strategy to Embed Avro Schema; this will result in larger flow files but should work in downstream processors. If/when there is a OrcRecordSetWriter, you should be able to reuse the HWX schema reference option there.

mburgess · ‎08-11-2017

Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).

mburgess · ‎07-28-2017

It's hard to tell from your screenshot what is going on. What kind of file(s) are being read into NiFi, and what is the content of the flow file(s) going to PutElasticsearch5? PutES5 expects a single JSON document as the content of a flow file, and depending on your processor configuration, will perform your specified operation on each document. If your flow file contains multiple documents, you may need SplitJson to get each into its own flow file. Alternatively if you are using NiFi 1.3.0 / HDF 3.0 (and don't mind using Elasticsearch's HTTP API vs the native one), you can use PutElasticsearchHttpRecord, which will allow you to handle flow files that contain multiple records of any format (provided you configure a Record Reader that can parse your input).

Online	Offline
Last Visited	‎10-29-2025 09:50 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 09:50 PM
Posts	905
Kudos received	658

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: How to handle failure of NiFi ExecuteProcess p...

Re: Incremental Fetch in NiFi with QueryDatabaseTa...

Re: How to update per query execution time in tabl...

Re: How to update per query execution time in tabl...

Re: Problems configuring FetchElasticSearch proces...

Re: Nifi MergeContent Not Merging

Re: NiFI Converting JSON to Avro to ORC and saving...

Re: NiFI ConvertRecord : AvroRecordSetWriter Produ...

Re: NiFI ConvertRecord : AvroRecordSetWriter Produ...

Re: Nifi problem with PutElasticsearch5