About mburgess

mburgess · ‎08-22-2017

Try * as the value for the Query property.

mburgess · ‎08-15-2017

It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.

mburgess · ‎08-14-2017

Are there any failures in the PutHDFS processor? Seems to me (unless the flowfiles have the same filename and Conflict Resolution Strategy is "append") that you should have 49 small flow files in HDFS (not that that's ideal). You won't be able to use MergeContent with ORC files as there is no strategy for that (same goes for MergeRecord until an OrcRecordSetWriter is implemented). If your flow files are Avro (going into ConvertAvroToORC), you could try MergeContent before ConvertAvroToORC and use the Avro merge strategy.

mburgess · ‎08-11-2017

I think the issue is with the HWX Content-Encoded Schema Reference, this is a special "header" in an avro file which makes it easy to integrate with HWX Schema Registry serializers and deserializers, but likely precludes it from being understood by Apache Avro readers such as the one in ConvertAvroToORC or avro-tools. If you can, try setting the Schema Write Strategy to Embed Avro Schema; this will result in larger flow files but should work in downstream processors. If/when there is a OrcRecordSetWriter, you should be able to reuse the HWX schema reference option there.

mburgess · ‎08-11-2017

Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).

mburgess · ‎07-28-2017

It's hard to tell from your screenshot what is going on. What kind of file(s) are being read into NiFi, and what is the content of the flow file(s) going to PutElasticsearch5? PutES5 expects a single JSON document as the content of a flow file, and depending on your processor configuration, will perform your specified operation on each document. If your flow file contains multiple documents, you may need SplitJson to get each into its own flow file. Alternatively if you are using NiFi 1.3.0 / HDF 3.0 (and don't mind using Elasticsearch's HTTP API vs the native one), you can use PutElasticsearchHttpRecord, which will allow you to handle flow files that contain multiple records of any format (provided you configure a Record Reader that can parse your input).

mburgess · ‎07-28-2017

What are some sample values for those parameters? Could they have spaces in them? Perhaps try putting quotes around each of the arguments like "${to}"?

mburgess · ‎07-24-2017

Try three slashes in the Database Driver Jar Url property: file:///post/postgresql-42.1.1.jar

mburgess · ‎07-11-2017

Koji is suggesting the use of GrokReader in a record-aware processor (such as QueryRecord or PartitionRecord), rather than the ExtractGrok processor. With a GrokReader, you can do your split using SQL (with QueryRecord), perhaps something like: SELECT * FROM FLOWFILE WHERE tstamp < ${now():toNumber():minus(1000)} and SELECT * FROM FLOWFILE WHERE tstamp >= ${now():toNumber():minus(1000)} to route the lines whether the timestamp (in a "tstamp" field) was before a second ago. Alternatively you can use PartitionRecord to group records into individual flow files, with each flow file containing the records that have the same values for the specified fields.

mburgess · ‎06-30-2017

SplitText for some reason starts the index at 1, the other Split processors start at 0. Sorry I had forgotten that difference, good catch!

Online	Offline
Last Visited	‎12-03-2025 12:10 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎12-03-2025 12:10 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Problems configuring FetchElasticSearch proces...

Re: Nifi MergeContent Not Merging

Re: NiFI Converting JSON to Avro to ORC and saving...

Re: NiFI ConvertRecord : AvroRecordSetWriter Produ...

Re: NiFI ConvertRecord : AvroRecordSetWriter Produ...

Re: Nifi problem with PutElasticsearch5

Re: How to pass flowfile attributes as arguments t...

Re: How to convert json into sql format through Ni...

Re: Logging a Stack Trace event with Nifi

Re: How to remove the header when using NiFi Split...