Member since
09-29-2015
58
Posts
76
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2035 | 01-25-2017 09:19 PM | |
2894 | 11-02-2016 02:54 PM | |
2972 | 09-08-2016 01:36 AM | |
5053 | 08-09-2016 07:52 PM | |
1336 | 06-30-2016 06:09 PM |
01-13-2017
08:05 PM
@Ranjit S I would recommend using ./nifi.sh start, rather than ./nifi.sh run and see if that works out for you. You can then stop it by running "nifi.sh stop".
... View more
01-04-2017
05:12 PM
1 Kudo
@Michael Silas For clarification, which cluster a node joins is determined by two properties in nifi.properties: nifi.zookeeper.connect.string and nifi.zookeeper.root.node. All of the nodes need to have the same value for these two properties. Also please ensure that you do not copy the 'state' directory from one node to another - one of the state elements is the node ID, and in version 1.0.0 it didn't do a great job of handling the case where two nodes used the same ID - that was fixed in 1.1.0 (in general I'd recommend using 1.1.0 if possible over 1.0.0 because there were several cluster-related issues addressed in 1.1.0). Additionally, because you are using an embedded ZooKeeper, I would ensure that the conf/zookeeper.properties has the same values on all nodes for the server.1, server.2, ... server.N properties as @Timothy Spann mentioned above, and that all nodes that have the nifi.state.management.embedded.zookeeper.start property of nifi.properties are also mentioned as server.xx (i.e., if all 5 NiFi nodes have nifi.state.management.embedded.zookeeper.start set to true, then you should have server.1, server.2, server.3, server.4, server.5 in your zookeeper.properties file and in your nifi.properties connect string. It's also important to ensure that each node is able to reach all other nodes, as ZooKeeper can become pretty unhappy when one node is unable to communicate with other nodes. Does this help?
... View more
11-02-2016
02:54 PM
2 Kudos
@mayki wogno you should be able ot simply use the value "vartest1" (without quotes), if this is the only attribute you care about.
... View more
09-29-2016
05:54 PM
2 Kudos
@Riccardo Iacomini, Using the 'pure-split-merge.xml' template that @jwitt provided, I am seeing numbers that I think we can improve. It's easy to see when running that template for a while that the bottleneck in the flow is MergeContent. Looking at what it is doing, and poking around a bit with a profiler, it looks like it is pretty inefficient in its pulling of FlowFiles from the queue. It was done this way for good reason, but I think with a minor (backward compatible enhancement to the API we can actually improve it quite a bit. I will look into it some more and let you know what I come up with.Thanks for bringing this to our attention! This is a pretty common use case and one that we certainly want to do be able to handle extremely well.
... View more
09-08-2016
01:36 AM
2 Kudos
@David DN what you are describing here is notion of "Exactly Once Delivery." I would refer you to http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ in order to get an understanding of why this is actually not possible in any distributed system. Often what we hear people discussing is the notion of "Exactly Once semantics" in order to overcome this. However, the notion of Exactly Once semantics can be achieved between two systems only if the sending system can guarantee At Least Once delivery and the receiving side provides a mechanism for data de-duplicaiton. When NiFi receives data from an external source, it does provide the capability for data de-duplication via the DetectDuplicate processor. So you can construct your flow so that if you receive data multiple times, you will process it only once. However, this is only achieved if you are receiving data over a reliable channel (for instance, ListenUDP may drop data as the UDP protocol is inherently lossy). NiFi generally will guarantee At Least Once delivery of your data (I say generally because it depends on the processor. For instance, the PutKafka processor will provide At Least Once delivery if configured to do so but if configured as Best Effort delivery, it may not) wen sending to an external system. However, to ensure that data is not duplicated on the receiving system, it would require that the receiving system also have some way to de-duplicate data.
... View more
09-07-2016
11:00 PM
10 Kudos
One of the most highly anticipated features of Apache NiFi 1.0.0 is the introduction
of Zero-Master Clustering. Previous versions of NiFi relied upon a single "Master Node"
(more formally known as the NiFi Cluster Manager) to show the User Interface. If this
node was lost, data continued to flow, but the
application was unable to show the topology of the flow, or show any stats. Additionally,
Site-to-Site communications continued to send data but could not obtain up-to-date
information about cluster topology, which resulted in less efficient load balancing. Version 1.0.0 of NiFi addresses these issues by switching to a Zero-Master Clustering paradigm.
This post will explore the approaches taken to ensure that NiFi provides high availability
of the control plane without sacrificing the User Experience. After all, the User Experience is
what has allowed NiFi to become the go-to solution for providing dataflow management to small
organizations as well as the world's largest enterprises. The benefit that the master/worker paradigm offered us was a design that was easy to reason over
and understand. All web requests were sent directly to the master. This means that coordination of
the flow was controlled by the master (e.g., it would prevent one user from modifying a Processor while
another user was modifying the Processor at the same time). The entire cluster topology was
stored only at the master. The "golden copy" of the flow configuration was held by the master. To
the extent possible, we wanted to keep this benefit of being easy to reason about how the system
works, while still overcoming all of these hurdles. I am happy to say that the NiFi community has accomplished this goal, keeping a simple, easy-to-understand
design with all of the benefits of High Availability. To do this, we leveraged the power of Apache ZooKeeper in
order to provide automatic election of different clustering-related roles. In NiFi 1.0.0, we have two
different roles that are automatically elected. The first role is the Primary Node (Yes! Gone are the days
of having to manually switch which node is Primary Node). The second role is the Cluster Coordinator. This new Cluster Coordinator role is responsible for monitoring the nodes in a cluster and marking any
nodes that fail to heartbeat as being "Disconnected." Additionally, the Cluster Coordinator provides a
mechanism to ensure that the flow is consistent across all nodes. This is accomplished by forwarding all
web-based requests to the Coordinator. The Coordinator can then replicate this request to all nodes in
the cluster and merge their responses into a single, unified view, in much the same way that the old
Cluster Manager did. However, with the shift to the Cluster Coordinator, if the node that is elected
Cluster Coordinator drops from the cluster, a new node will automatically pick up these responsibilities.
This approach means that users are able to navigate to the URL of any node in a NiFi cluster, so users
need not concern themselves with which node is currently elected the Cluster Coordinator. All of the
necessary coordination, such as component locking, is handled at a single point, so there is no need to
introduce expensive and difficult-to-understand distributed locking mechanisms. Additionally, these changes provide a great footing to build upon for the upcoming changes that are planned
for Data Replication across nodes in a NiFi cluster. A NiFi Feature Proposal outlines this feature at
a fairly high level at https://cwiki.apache.org/confluence/display/NIFI/Data+Replication. This notion of
an automatically elected, highly available Cluster Coordinator means that we can also develop an
easy-to-understand approach for this Data Replication, as well, since we are able to elect a single
node to coordinate the failover of the data processing. Also new to NiFi 1.0.0 is an overhaul of the security model and component-level versioning. We refer
to these updates jointly as providing multi-tenancy. NiFi now supports any number of users viewing and
modifying the flow at the same time without the need to continually refresh the flow. In addition to this,
permissions can now be given to users to read or modify any component, individually. Prior to version 1.0.0,
NiFi required that users be given read-only access or write access to the entire flow. However, as NiFi
continues to gain more and more adoption, enterprise users have been seeking the ability to restrict access
to specific components to different users. This is now possible, with a simple, intuitive user interface to
provide and configure access policies. Bryan Bende, an Apache NiFi PMC member has provided an excellent overview
of this feature at http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy. Version 1.0.0 of Apache NiFi has been a long time in the making and is available now. In this post, we've given
a very high level overview of how the Zero-Master Clustering feature works. A completely redesigned UI and several
minor features and improvements have been added, as well. NiFi can be downloaded at
http://nifi.apache.org/download.html with Release Notes available at
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.0.0. Please let us know how we
can continue to improve the application and what you would love to see added into a future version!
... View more
Labels:
08-09-2016
07:52 PM
4 Kudos
MergeContent by itself doesn't allow you to merge parts of the FlowFiles together. However, an option that you can use is to put RouteText before MergeContent. In RouteText, set the Matching Strategy to "Satisfies Expression." Then, add a property named "header" with a value of "${lineNo:equals(1)}". Leave all others as default values. This will route the first line of each FlowFile to the "header" relationship. You can then auto-terminate the "header" relationship and route "unmatched" to MergeContent. That is, filter out the header line and route everything else to Merge Content. With MergeContent, you can then set "Header" to the text of your header and set "Delimiter Strategy" to "Text." This will cause MergeContent to add that header line back to the merged FlowFile for you. Sorry that this is so non-trivial, but I think this approach will at least give you what you're looking for.
... View more
08-09-2016
07:45 PM
1 Kudo
@BigDataRocks - I believe that Bryan's answer above is very accurate, so this is not really intended to directly answer your question but wanted to mention that your directory structure above can be simplified to just: eventsink/${service_type}/${event_name}/${now():format('yyyy/MM/dd/HHmmssSSS')}.${filename}.json As you have it above, you are asking for "now()" multiple times would could cause some weirdness if the hour rolls over between invocations, etc. Doing it all with a single call to now() will address this and simplifies the configuration as well.
... View more
08-09-2016
07:13 PM
Alvin, You should be able to get more details by adding the following line to your conf/logback.xml file: <logger name="org.apache.nifi.processors.standard.FetchSFTP" level="DEBUG" /> That will cause it to log the full stack trace so that you can see what's going on. FetchSFTP does not interact with ZooKeeper or site-to-site, so you should be okay there. The Distributed Cache Service is also not necessary to use FetchSFTP.
... View more
07-06-2016
04:02 PM
2 Kudos
@Alexander Aolaritei Is the port that you create in NiFi an Output Port on the root group? If not, you will need to make sure that you have it configured this way. Also, if you are running NiFi in secure mode, you will need to ensure that your client authenticates properly and that you have granted access to client in NiFi by configuring the port and going to the 'Access Controls' tab.
... View more