Member since
07-04-2018
17
Posts
1
Kudos Received
0
Solutions
04-02-2019
01:27 PM
1 Kudo
You can use MergeContent or MergeRecord for this, it can take flow files each with a single record and combine them together to make a flow file containing many Avro records, then you can use ConvertAvroToParquet or PutParquet.
... View more
03-22-2019
08:39 PM
@Derek Calderon Unfortunately, a rolling restart will not work here. When you shutdown only one node another node in the cluster would get elected as cluster coordinator and still retain list of nodes known to be part of cluster. By shutting down all nodes you essential wipe the slate clean of what nodes are in cluster. On start nodes will come up, components (processors, controller services, reporting tasks. etc..) will return to last known state (running, stopped, etc..), FlowFiles will be loaded back in to last reported connection queue, and processing will continue. - One of the nodes will get elected as the cluster coordinator by Zookeeper and then other nodes will learn from ZK who was elected and start sending heartbeats directly to that elected cluster coordinator to join the cluster. As node join they will be added to list of connected nodes. - The dead node that never checks in will no longer be in the list. - No need to stop components before stopping NiFi. They will return to last known state on start. - Thank you, Matt
... View more
07-06-2018
05:09 PM
@Derek Calderon - Sorry to hear that. I did share this HCC link with a few devs I know if they have time to assist. - Thanks, Matt
... View more
07-09-2018
01:22 PM
1 Kudo
@Derek Calderon - Short answer is no. The ExecuteSQL processor is written to write the output to the FlowFile's content. - There is an alternative solution. You have some processor currently feeding FlowFiles to your ExecuteSQL processor via a connection. My suggestion would be to feed that same connection to two different paths. The first connection feeds to a "MergeContent" processor via a funnel and the second feeds to your "ExecuteSQL" processor. The ExecuteSQL processor performs the query and retrieves the data you are looking for writing it to the content of the FlowFile. You then use a processor like "ExtractText" to extract that FlowFIles new content to FlowFile Attributes. Finally you use a processor like "ModifyBytes" to remove all content of this FlowFile. Finally you feed this processor to the same funnel as the other path. The MergeContent processor could then merge these two flowfiles using the "Correlation Attribute Name" property (assuming "filename" is unique, that could be used), min/max entries set to 2, and "Attribute Strategy" set to "Keep All Unique Attributes". The result should be what you are looking for. - Flow would look something like following: Having multiple identical connections does not trigger NiFi to write the 200 mb of content twice to the the content repository. a new FlowFile is created but it points to the sam content claim. New content is only generated when the executeSQL is run against one of the FlowFiles. So this flow does not produce any additional write load on the content repo other then when the executeSQL writes its output which i am assuming is relatively small? - Thank you, Matt
... View more