About mpayne

mpayne · ‎01-13-2017

@Ranjit S I would recommend using ./nifi.sh start, rather than ./nifi.sh run and see if that works out for you. You can then stop it by running "nifi.sh stop".

mpayne · ‎01-04-2017

@Michael Silas For clarification, which cluster a node joins is determined by two properties in nifi.properties: nifi.zookeeper.connect.string and nifi.zookeeper.root.node. All of the nodes need to have the same value for these two properties. Also please ensure that you do not copy the 'state' directory from one node to another - one of the state elements is the node ID, and in version 1.0.0 it didn't do a great job of handling the case where two nodes used the same ID - that was fixed in 1.1.0 (in general I'd recommend using 1.1.0 if possible over 1.0.0 because there were several cluster-related issues addressed in 1.1.0). Additionally, because you are using an embedded ZooKeeper, I would ensure that the conf/zookeeper.properties has the same values on all nodes for the server.1, server.2, ... server.N properties as @Timothy Spann mentioned above, and that all nodes that have the nifi.state.management.embedded.zookeeper.start property of nifi.properties are also mentioned as server.xx (i.e., if all 5 NiFi nodes have nifi.state.management.embedded.zookeeper.start set to true, then you should have server.1, server.2, server.3, server.4, server.5 in your zookeeper.properties file and in your nifi.properties connect string. It's also important to ensure that each node is able to reach all other nodes, as ZooKeeper can become pretty unhappy when one node is unable to communicate with other nodes. Does this help?

mpayne · ‎11-02-2016

@mayki wogno you should be able ot simply use the value "vartest1" (without quotes), if this is the only attribute you care about.

mpayne · ‎09-29-2016

@Riccardo Iacomini, Using the 'pure-split-merge.xml' template that @jwitt provided, I am seeing numbers that I think we can improve. It's easy to see when running that template for a while that the bottleneck in the flow is MergeContent. Looking at what it is doing, and poking around a bit with a profiler, it looks like it is pretty inefficient in its pulling of FlowFiles from the queue. It was done this way for good reason, but I think with a minor (backward compatible enhancement to the API we can actually improve it quite a bit. I will look into it some more and let you know what I come up with.Thanks for bringing this to our attention! This is a pretty common use case and one that we certainly want to do be able to handle extremely well.

mpayne · ‎09-08-2016

@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system. Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton. When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy). NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.

mpayne · ‎09-07-2016

One of the most highly anticipated features of Apache NiFi 1.0.0 is the introduction of Zero-Master Clustering. Previous versions of NiFi relied upon a single "Master Node" (more formally known as the NiFi Cluster Manager) to show the User Interface. If this node was lost, data continued to flow, but the application was unable to show the topology of the flow, or show any stats. Additionally, Site-to-Site communications continued to send data but could not obtain up-to-date information about cluster topology, which resulted in less efficient load balancing. Version 1.0.0 of NiFi addresses these issues by switching to a Zero-Master Clustering paradigm. This post will explore the approaches taken to ensure that NiFi provides high availability of the control plane without sacrificing the User Experience. After all, the User Experience is what has allowed NiFi to become the go-to solution for providing dataflow management to small organizations as well as the world's largest enterprises. The benefit that the master/worker paradigm offered us was a design that was easy to reason over and understand. All web requests were sent directly to the master. This means that coordination of the flow was controlled by the master (e.g., it would prevent one user from modifying a Processor while another user was modifying the Processor at the same time). The entire cluster topology was stored only at the master. The "golden copy" of the flow configuration was held by the master. To the extent possible, we wanted to keep this benefit of being easy to reason about how the system works, while still overcoming all of these hurdles. I am happy to say that the NiFi community has accomplished this goal, keeping a simple, easy-to-understand design with all of the benefits of High Availability. To do this, we leveraged the power of Apache ZooKeeper in order to provide automatic election of different clustering-related roles. In NiFi 1.0.0, we have two different roles that are automatically elected. The first role is the Primary Node (Yes! Gone are the days of having to manually switch which node is Primary Node). The second role is the Cluster Coordinator. This new Cluster Coordinator role is responsible for monitoring the nodes in a cluster and marking any nodes that fail to heartbeat as being "Disconnected." Additionally, the Cluster Coordinator provides a mechanism to ensure that the flow is consistent across all nodes. This is accomplished by forwarding all web-based requests to the Coordinator. The Coordinator can then replicate this request to all nodes in the cluster and merge their responses into a single, unified view, in much the same way that the old Cluster Manager did. However, with the shift to the Cluster Coordinator, if the node that is elected Cluster Coordinator drops from the cluster, a new node will automatically pick up these responsibilities. This approach means that users are able to navigate to the URL of any node in a NiFi cluster, so users need not concern themselves with which node is currently elected the Cluster Coordinator. All of the necessary coordination, such as component locking, is handled at a single point, so there is no need to introduce expensive and difficult-to-understand distributed locking mechanisms. Additionally, these changes provide a great footing to build upon for the upcoming changes that are planned for Data Replication across nodes in a NiFi cluster. A NiFi Feature Proposal outlines this feature at a fairly high level at https://cwiki.apache.org/confluence/display/NIFI/Data+Replication. This notion of an automatically elected, highly available Cluster Coordinator means that we can also develop an easy-to-understand approach for this Data Replication, as well, since we are able to elect a single node to coordinate the failover of the data processing. Also new to NiFi 1.0.0 is an overhaul of the security model and component-level versioning. We refer to these updates jointly as providing multi-tenancy. NiFi now supports any number of users viewing and modifying the flow at the same time without the need to continually refresh the flow. In addition to this, permissions can now be given to users to read or modify any component, individually. Prior to version 1.0.0, NiFi required that users be given read-only access or write access to the entire flow. However, as NiFi continues to gain more and more adoption, enterprise users have been seeking the ability to restrict access to specific components to different users. This is now possible, with a simple, intuitive user interface to provide and configure access policies. Bryan Bende, an Apache NiFi PMC member has provided an excellent overview of this feature at http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy. Version 1.0.0 of Apache NiFi has been a long time in the making and is available now. In this post, we've given a very high level overview of how the Zero-Master Clustering feature works. A completely redesigned UI and several minor features and improvements have been added, as well. NiFi can be downloaded at http://nifi.apache.org/download.html with Release Notes available at https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.0.0. Please let us know how we can continue to improve the application and what you would love to see added into a future version!

mpayne · ‎08-09-2016

MergeContent by itself doesn't allow you to merge parts of the FlowFiles together. However, an option that you can use is to put RouteText before MergeContent. In RouteText, set the Matching Strategy to "Satisfies Expression." Then, add a property named "header" with a value of "${lineNo:equals(1)}". Leave all others as default values. This will route the first line of each FlowFile to the "header" relationship. You can then auto-terminate the "header" relationship and route "unmatched" to MergeContent. That is, filter out the header line and route everything else to Merge Content. With MergeContent, you can then set "Header" to the text of your header and set "Delimiter Strategy" to "Text." This will cause MergeContent to add that header line back to the merged FlowFile for you. Sorry that this is so non-trivial, but I think this approach will at least give you what you're looking for.

mpayne · ‎08-09-2016

@BigDataRocks - I believe that Bryan's answer above is very accurate, so this is not really intended to directly answer your question but wanted to mention that your directory structure above can be simplified to just: eventsink/${service_type}/${event_name}/${now():format('yyyy/MM/dd/HHmmssSSS')}.${filename}.json As you have it above, you are asking for "now()" multiple times would could cause some weirdness if the hour rolls over between invocations, etc. Doing it all with a single call to now() will address this and simplifies the configuration as well.

mpayne · ‎08-09-2016

Alvin, You should be able to get more details by adding the following line to your conf/logback.xml file: <logger name="org.apache.nifi.processors.standard.FetchSFTP" level="DEBUG" /> That will cause it to log the full stack trace so that you can see what's going on. FetchSFTP does not interact with ZooKeeper or site-to-site, so you should be okay there. The Distributed Cache Service is also not necessary to use FetchSFTP.

mpayne · ‎07-06-2016

@Alexander Aolaritei Is the port that you create in NiFi an Output Port on the root group? If not, you will need to make sure that you have it configured this way. Also, if you are running NiFi in secure mode, you will need to ensure that your client authenticates properly and that you have granted access to client in NiFi by configuring the port and going to the 'Access Controls' tab.

Online	Offline
Last Visited	‎12-13-2022 11:55 AM

Member Since	‎09-29-2015 04:34 PM
Last Visited	‎12-13-2022 11:55 AM
Posts	58
Kudos received	76

Cloudera Community

Re: Getting ThreadPoolRequestReplicator warning in...

Re: [RESOLVED] NIFI : ListenHTTP : Header Regex

Re: Can nifi promise each of the flowfiles can be...

Re: How to MergeContent with single header from mu...

Re: Which Maxmind Database to use with NiFi?

Re: NIFI Installation: keep getting "Apache NiFi i...

Re: Nifi Nodes will not connect to the cluster

Re: [RESOLVED] NIFI : ListenHTTP : Header Regex

Re: NiFi: unable to improve performances

Re: Can nifi promise each of the flowfiles can be...

Apache NiFi 1.0.0 - Zero-Master Clustering

Re: How to MergeContent with single header from mu...

Re: Merge json events based on property

Re: ListSftp works but FetchSftp doesn't work in C...

Re: Could not find port with name 'portName' for r...