Member since
09-29-2015
58
Posts
76
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2038 | 01-25-2017 09:19 PM | |
2899 | 11-02-2016 02:54 PM | |
2987 | 09-08-2016 01:36 AM | |
5062 | 08-09-2016 07:52 PM | |
1339 | 06-30-2016 06:09 PM |
12-15-2022
05:24 AM
This is working fine. Can we provide Search Value and Replacement Value as Variable or flowfile attribute. As I wanted to use same replace text processor to convert different input files with different number of columns. Basically I want to parameterised the Search Value and Replacement Value in replace text processor. @mpayne @ltsimps1 @kpulagam @jpercivall @other
... View more
08-10-2020
08:32 PM
1 Kudo
I had similar issue today, where Nifi server was up, but UI kept spinning. I did clear browser cache and restarted the server, and browser. The UI came back fine. Just mentioning it for others, who may have similar issue,
... View more
04-27-2020
12:57 PM
The answer @mpayne is correct. Only that setting the header in MergeContent doesn't include a line break at the end of header and the records. So as EL is supported please include line break at the end.
... View more
02-15-2018
06:22 AM
Is there any way to increase this window from 5 minutes to lets say an hour or more? ( A common question would be how many records were processed during a 24 hour window etc.)
... View more
09-14-2017
02:18 PM
1 Kudo
@Mitthu Wagh There is no way to change the 5 minute running average on the processors. What are you looking for? You can still see the stats from the processor by hovering the cursor over the processor and right=click, then a menu pops up and you can see the Status History of the processor: A new window opens with a choice of stats to view
... View more
06-29-2017
07:27 PM
4 Kudos
Apache NiFi has grown tremendously over the past 2 and a half years since it was open sourced. The community is continuously thinking of, implementing, and contributing amazing new features. The newly released version 1.2.0 of NiFi is no exception. One of the most exciting features of this new release is the QueryRecord Processor and the Record Reader and Record Writer components that go along with it. If you aren't familiar with those components, there's a blog post that explains how those work. This new Processor, powered by Apache Calcite, allows users to write SQL SELECT statements to run over their data as it streams through the system. Each FlowFile in NiFi can be treated as if it were a database table named FLOWFILE. These SQL queries can be used to filter specific columns or fields from your data, rename those columns/fields, filter rows, perform calculations and aggregations on the data, route the data, or whatever else you may want to use SQL for. All from the comfy confines of the most widely known and used Domain Specific Language. This is a big deal! Of course, there are already other platforms that allow you to run SQL over arbitrary data, though. So let's touch on how NiFi differs from other platforms that run SQL over arbitrary data outside of an RDBMS:
Queries are run locally. There is no need to push the data to some external service, such as S3, in order to run queries over your data. There's also no need to pay for that cloud storage or the bandwidth, or to create temporary "staging tables" in a database. Queries are run inline. Since your data is already streaming through NiFi, it is very convenient to add a new QueryRecord Processor to your canvas. You are already streaming your data through NiFi, aren't you? If not, you may find Our Docs Page helpful to get started learning more about NiFi. Query data in any format. Write results in any data format. One of the goals of this effort was to allow data to be queried regardless of the format. In order to accomplish this, the Processor was designed to be configured with a "Record Reader" Controller Service and a "Record Writer" Controller Service. Out of the box, there are readers for CSV, JSON, Avro, and even log data. The results of the query can be written out in CSV, JSON, Avro, or free-form text (for example, a log format) using the NiFi Expression Language. If your data is in another format, you are free to write your own implementation of the Record Reader and/or Record Writer Controller Service. A simple implementation can wrap an existing library to return Record objects from an InputStream. That's all that is needed to run SQL over your own custom data format! Writing the results in your own custom data format is similarly easy. It's fast - very fast! Just how fast largely depends on the performance of your disks and the number of disks that you have available. The data must be read from disk, queried, and then the results written to disk. In most scenarios, reading of the data from disk is actually avoided due to operating system disk caching, though. We've tested the performance on a server with 32 cores and 64 GB RAM. We ran a continuous stream of JSON-formatted Provenance data through the Processor. To do this, we used theNiFi Provenance Reporting Task to send the Provenance data back to the NiFi instance. Because we wanted to stress the Processor, we then used a DuplicateFlowFile processor to create 200 copies of the data (this really just creates 200 pointers to the same piece of data; it doesn't copy the data itself). We used a SQL query to pull out any Provenance "SEND" event (a small percentage of the data, in this case). Using 12 concurrent tasks, we were able to see the query running at a consistent rate of about 1.2 GB per second on a single node - using less than half of the available CPU! Data Provenance keeps a detailed trail of what happened. One of the biggest differentiators between NiFi and other dataflow platforms is the detailed Data Provenance that NiFi records. If the data that you end up with is not what you expect, the Data Provenance feature makes it easy to see exactly what the data looked like at each point in the flow and pinpoint exactly what when wrong - as well as understand where the data come from and where the data went. When the flow has been updated, the data can be replayed with the click of a button and the new results can be verified. If the results are still not right, update your flow and replay again. HOW IT WORKS In order to get started, we need to add a QueryRecord Processor to the graph. Once we've added the Processor to the graph, we need to configure three things: the Controller Service to use for reading data, the service to use for writing the results, and one or more SQL queries. The first time that you set this all up, it may seem like there's a lot going on. But once you've done it a couple of times, it becomes pretty trivial. The rest of this post will be dedicated to setting everything up. First, we will configure the Record Reader Controller Service. We'll choose to create a new service: We are given a handful of different types of services to choose from. For this example, we will use CSV data, so we will use a CSVReader: We will come back and configure our new CSVReader service in a moment. For now, we will click the "Create" button and then choose to create a new Controller Service to write the results. We will write the results in JSON format: We can again click the "Create" button to create the service. Now that we have created our Controller Services, we will click the "Go To" button on our reader service to jump to the Controller Service configuration page: CONFIGURING THE READER We can click the pencil in the right-hand column to configure our CSV Reader: The CSV Reader gives us plenty of options to customize the reader to our format, as can be seen in the above image. For this example, we will leave most of the defaults, but we will change the "Skip Header Line" Property from the default value of "false" to "true" because our data will contain a header line that we don't want to process as an actual record. In order to run SQL over our data and make sense of the columns in our data, we need to also configure the reader with a schema for the data. We have several options for doing this. We can use an attribute on the FlowFile that includes an Avro-formatted schema. Or we can use a Schema Registry to store our schema and access it by name or by identifier and version. Let's go ahead and use a local Schema Registry and add our schema to that Registry. To do so, we will change the "Schema Access Strategy" to "Use 'Schema Name' Property." This means that we want to look up the Schema to use by name. The name of the schema is specified by the "Schema Name" Property. The default value for that property is "${schema.name}", which means that we will just use the "schema.name" attribute to identify which schema we want to use. Instead of using an attribute, we could just type the name of the schema here. Doing so would mean that we would need a different CSV Reader for each schema that we want to read, though. By using an attribute, we can have a single CSV Reader that works for all schemas. Next, we need to specify the Schema Registry to use. We click the value of the "Schema Registry" property and choose to "Create new service..." We will use the AvroSchemaRegistry. (Note that our incoming data is in CSV format and the output will be in JSON. That's okay, Avro in this sense only refers to the format of the schema provided. So we will provide a schema in the same way that we would if we were using Avro.) We will click the "Create" button and then click the "Go To" arrow that appears in the right-hand column in order to jump to the Schema Registry service (and click 'Yes' to save changes to our CSV Reader service). This will take us back to our Controller Services configuration screen. It is important to note that from this screen, each Controller Service has a "Usage" icon on the left-hand side (it looks like a little book). That icon will take you to the documentation on how to usage that specific Controller Service. The documentation is fairly extensive. Under the "Description" heading, each of the Record Readers and Writers has an "Additional Details..." link that provides much more detailed information about how to use service and provides examples. We will click the Edit ("pencil") icon next to the newly created AvroSchemaRegistry and go to the Properties tab. Notice that the service has no properties, so we click the New Property ("+") icon in the top-right corner. The name of the property is the name that we will use to refer to the schema. Let's call it "hello-world". For the value, we can just type or paste in the schema that we want to use using the Avro Schema syntax. For this example, we will use the following schema: {
"name": "helloWorld",
"namespace": "org.apache.nifi.blogs",
"type": "record",
"fields": [
{ "name": "purchase_no", "type": "long" },
{ "name": "customer_id", "type": "long" },
{ "name": "item_id", "type": ["null", "long"] },
{ "name": "item_name", "type": ["null", "string"] },
{ "name": "price", "type": ["null", "double"] },
{ "name": "quantity", "type": ["null", "int"] },
{ "name": "total_price", "type": ["null", "double"] }
]
}
Now we can click "OK" and apply our changes. Clicking Enable (the lightning bolt icon) enables the service. We can now also enable our CSV Reader. CONFIGURING THE WRITER
Similarly, we need to configure our writer with a schema so that NiFi knows how we expect our data to look. If we click the Pencil icon next to our JSONRecordSetWriter, in the Properties tab, we can configure whether we want our JSON data to be pretty-printed or not and how we want to write out date and time fields. We also need to specify how to access the schema and how to convey the schema to down-stream processing. For the "Schema Write Strategy," since we are using JSON, we will just set the "schema.name" attribute, so we will leave the default value. If we were writing in Avro, for example, we would probably want to include the schema in the data itself. For the "Schema Access Strategy," we will use the "Schema Name" property, and set the "Schema Name" property to "${schema.name}" just as we did with the CSV Reader. We then select the same AvroSchemaRegistry service for the "Schema Registry" property. Again, we click "Apply" and then click the Lightning icon to enable our Controller Service and click the Enable button. We can then click the "X" to close out this dialog. WRITE THE SQL Now comes the fun part! We can go back to configure our QueryRecord Processor. In the Properties tab, we can start writing our queries. For this example, let's take the following CSV data: purchase_no, customer_id, item_id, item_name, price, quantity
10280, 40070, 1028, Box of pencils, 6.99, 2
10280, 40070, 4402, Stapler, 12.99, 1
12440, 28302, 1029, Box of ink pens, 8.99, 1
28340, 41028, 1028, Box of pencils, 6.99, 18
28340, 41028, 1029, Box of ink pens, 8.99, 18
28340, 41028, 2038, Printer paper, 14.99, 10
28340, 41028, 4018, Clear tape, 2.99, 10
28340, 41028, 3329, Tape dispenser, 14.99, 10
28340, 41028, 5192, Envelopes, 4.99, 45
28340, 41028, 3203, Laptop Computer, 978.88, 2
28340, 41028, 2937, 24\" Monitor, 329.98, 2
49102, 47208, 3204, Powerful Laptop Computer, 1680.99, 1
In our Properties tab, we can click the "Add Property" button to add a new property. Because we can add multiple SQL queries in a single Processor, we need a way to distinguish the results of each query and route the data appropriately. As such, the name of the property is the name of the Relationship that data matching the query should be routed to. We will create two queries. The first will be named "over.1000" and will include the purchase_no and customer_id fields of any purchase that cost more than $1,000.00 and will also include a new field named total_price that is the dollar amount for the entire purchase. Note that when entering a value for a property in NiFi, you can use Shift + Enter to insert a newline in your value: SELECT purchase_no, customer_id, SUM(price * quantity) AS total_price
FROM FLOWFILE
GROUP BY purchase_no, customer_id
HAVING SUM(price * quantity) > 1000
The second property will be named "largest.order" and will contain the purchase_no, customer_id, and total price of the most expensive single purchase (as defined by price times quantity) in the data: SELECT purchase_no, customer_id, SUM(price * quantity) AS total_price
FROM FLOWFILE
GROUP BY purchase_no, customer_id
ORDER BY total_price DESC
LIMIT 1
Now we will wire our QueryRecord processor up in our graph so that we can use it. For this demo, we will simply use a GenerateFlowFile to feed it data. We will set the "Custom Text" property to the CSV that we have shown above. In the "Scheduling" tab, I'll configure the processor to run once every 10 seconds so that I don't flood the system with data. We need to add a "schema.name" attribute, so we will route the "success" relationship of GenerateFlowFile to an UpdateAttribute processor. To this processor, we will add a new Property named "schema.name" with a value of "hello-world" (to match the name of the schema that we added to the AvroSchemaRegistry service). We will route the "success" relationship to QueryRecord.
Next, we will create two UpdateAttribute Processors and connect the "over.1000" relationship to the first and the "largest.order" relationship to the other. This just gives us a simple place to hold the data so that we can view it. I will loop the "failure" relationship back to the QueryRecord processor so that if there is a problem evaluating the SQL, the data will remain in my flow. I'll also auto-terminate the "original" relationship because once I have evaluated the SQL, I don't care about the original data anymore. I'll also auto-terminate the "success" relationship of each terminal UpdateAttribute processor. When we start the QueryRecord, GenerateFlowFile, and the first UpdateAttribute processors, we end up with a FlowFile queued up before each UpdateAttribute processor: If we right-click on the "over.1000" connection and click "List queue," we are able to see the FlowFiles in the queue: Clicking the "information" icon on the left-hand-side of the FlowFile gives us the ability to view the FlowFile's attributes and download or view its contents. We can click the "View" button to view the content. Changing the "View as" drop-down at the top from "original" to "formatted" gives us a pretty-printed version of the JSON, and we quickly see the results that we want: Note that we have a null value for the columns that are defined in our schema that were not part of our results. If we wanted to, We could certainly update our schema in order to avoid having these fields show up at all. Viewing the FlowFiles of the "largest.order" connection shows us a single FlowFile also, with the content that we expect there as well: Of course, if we have already run the data through the flow and want to go back and inspect that data and what happened to it, we can easily do that through NiFi's powerful Data Provenance feature.
... View more
Labels:
06-06-2017
03:28 PM
5 Kudos
Note that this is a re-post of the original article that was posted at https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi. INTRO - THE WHAT Apache NiFi is being used by many companies and organizations to power their data distribution needs. One of NiFi's strengths is that the framework is data agnostic. It doesn't care what type of data you are processing. There are processors for handling JSON, XML, CSV, Avro, images and video, and several other formats. There are also several general-purposes processors, such as RouteText and CompressContent. Data can be hundreds of bytes or many gigabytes. This makes NiFi a powerful tool for pulling data from external sources; routing, transforming, and aggregating it; and finally delivering it to its final destinations. Note that this While this ability to handle any arbitrary data is incredibly powerful, we often see users working with record-oriented data. That is, large volumes of small "records," or "messages," or "events." Everyone has their own way to talk about this data, but we all mean the same thing. Lots of really small, often structured, pieces of information. This data comes in many formats. Most commonly, we see CSV, JSON, Avro, and log data. While there are many tasks that NiFi makes easy, there are some common tasks that we can do better with. So in version 1.2.0 of NiFi, we released a new set of Processors and Controller Services, for working with record-oriented data. The new Processors are configured with a Record Reader and a Record Writer Controller Service. There are readers for JSON, CSV, Avro, and log data. There are writers for JSON, CSV, and Avro, as well as a writer that allows users to enter free-form text. This can be used to write data out in a log format, like it was read in, or any other custom textual format. THE WHY These new processors make building flows to handle this data simpler. It also means that we can build processors that accept any data format without having to worry about the parsing and serialization logic. Another big advantage of this approach is that we are able to keep FlowFiles larger, each consisting of multiple records, which results in far better performance. THE HOW - EXPLANATION In order to make sense of the data, Record Readers and Writers need to know the schema that is associated with the data. Some Readers (for example, the Avro Reader) allow the schema to be read from the data itself. The schema can also be included as a FlowFile attribute. Most of the time, though, it will be looked up by name from a Schema Registry. In this version of NiFi, two Schema Registry implementations exist: an Avro-based Schema Registry service and a client for an external Hortonworks Schema Registry. Configuring all of this can feel a bit daunting at first. But once you've done it once or twice, it becomes rather quick and easy to configure. Here, we will walk through how to set up a local Schema Registry service, configure a Record Reader and a Record Writer, and then start putting to use some very powerful processors. For this post, we will keep the flow simple. We'll ingest some CSV data from a file and then use the Record Readers and Writers to transform the data into JSON, which we will then write to a new directory. THE HOW - TUTORIAL In order to start reading and writing the data, as mentioned above, we are going to need a Record Reader service. We want to parse CSV data and turn that into JSON data. So to do this, we will need a CSV Reader and a JSON Writer. So we will start by clicking the "Settings" icon on our Operate palette. This will allow us to start configuring our Controller Services. When we click the 'Add' button in the top-right corner, we see a lot of options for Controller Services: Since we know that we want to read CSV data, we can type "CSV" into our Filter box in the top-right corner of the dialog, and this will narrow down our choices pretty well: We will choose to add the CSVReader service and then configure the Controller Service. In the Properties tab, we have a lot of different properties that we can set: Fortunately, most of these properties have default values that are good for most cases, but you can choose which delimiter character you want to use, if it's not a comma. You can choose whether or not to skip the first line, treating it as a header, etc. For my case, I will set the "Skip Header Line" property to "true" because my data contains a header line that I don't want to process as a record. The first properties are very important though. The "Schema Access Strategy" is used to instruct the reader on how to obtain the schema. By default, it is set to "Use String Fields From Header." Since we are also going to be writing the data, though, we will have to configure a schema anyway. So for this demo, we will change this strategy to "Use 'Schema Name' Property." This means that we are going to lookup the schema from a Schema Registry. As a result, we are now going to have to create our Schema Registry. If we click on the "Schema Registry" property, we can choose to "Create new service...": We will choose to create an AvroSchemaRegistry. It is important to note here that we are reading CSV data and writing JSON data - so why are we using an Avro Schema Registry? Because this Schema Registry allows us to convey the schema using the Apache Avro Schema format, but it does not imply anything about the format of the data being read. The Avro format is used because it is already a well-known way of storing data schemas. Once we've added our Avro Schema Registry, we can configure it and see in the Properties tab that it has no properties at all. We can add a schema by adding a new user-defined property (by clicking the 'Add' / 'Plus' button in the top-right corner). We will give our schema the name "demo-schema" by using this as the name of the property. We can then type or paste in our schema. For those unfamiliar with Avro schemas, it is a JSON formatted representation that has a syntax like the following: {
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" }
]
}
Here, we have a simple schema that is of type "record." This is typically the case, as we want multiple fields. We then specify all of the fields that we have. There is a field named "id" of type "int" and all other fields are of type "string." See the Avro Schema documentation for more information. We've now configured our schema! We can enable our Controller Services now. We can now add our JsonRecordSetWriter controller service as well. When we configure this service, we see some familiar options for indicating how to determine the schema. For the "Schema Access Strategy," we will again use the "Use 'Schema Name' Property," which is the default. Also note that the default value for the "Schema Name" property uses the Expression Language to reference an attribute named "schema.name". This provides a very nice flexibility, because now we can re-use our Record Readers and Writers and simply convey the schema by using an UpdateAttribute processor to specify the schema name. No need to keep creating Record Readers and Writers. We will set the "Schema Registry" property to the AvroSchemaRegistry that we just created and configured. Because this is a Record Writer instead of a Record Reader, we also have another interesting property: "Schema Write Strategy." Now that we have configured how to determine the data's schema, we need to tell the writer how to convey that schema to the next consumer of the data. The default option is to add the name of the schema as an attribute. We will accept the default. But we could also write the entire schema as a FlowFile attribute or use some strategies that are useful for interacting with the Hortonworks Schema Registry.
Now that we've configured everything, we can apply the settings and start our JsonRecordSetWriter as well. We've now got all of our Controller Services setup and enabled: Now for the fun part of building our flow! The above will probably take about 5 minutes, but it makes laying out the flow super easy! For our demo, we will have a GetFile processor bring data into our flow. We will use UpdateAttribute to add a "schema.name" attribute of "demo-schema" because that is the name of the schema that we configured in our Schema Registry: We will then use the ConvertRecord processor to convert the data into JSON. Finally, we want to write the data out using PutFile: We still need to configure our ConvertRecord processor, though. To do so, all that we need to configure in the Properties tab is the Record Reader and Writer that we have already configured: Now, starting the processors, we can see the data flowing through our system! Also, now that we have defined these readers and writers and the schema, we can easily create a JSON Reader, and an Avro Writer, for example. And adding additional processors to split the data up, query and route the data becomes very simple because we've already done the "hard" part. CONCLUSION In version 1.2.0 of Apache NiFi, we introduced a handful of new Controller Services and Processors that will make managing dataflows that process record-oriented data much easier. I fully expect that the next release of Apache NiFi will have several additional processors that build on this. Here, we have only scratched the surface of the power that this provides to us. We can delve more into how we are able to transform and route the data, split the data up, and migrate schemas. In the meantime, each of the Controller Services for reading and writing records and most of the new processors that take advantage of these services have fairly extensive documentation and examples. To see that information, you can right-click on a processor and click "Usage" or click the "Usage" icon on the left-hand side of your controller service in the configuration dialog. From there, clicking the "Additional Details..." link in the component usage will provide pretty significant documentation. For example, the JsonTreeReader provides a wealth of knowledge in its Additional Details.
... View more
Labels:
06-13-2017
03:25 PM
@Oleksandr Solomko have you changed the default value of the "nifi.queue.swap.threshold" property in nifi.properties? If so, you may be running into NIFI-3897.
... View more
08-02-2019
02:25 PM
@Riccardo Iacomini Thank you for the great post! This is very helpful. Here I am wondering how you batch things together like having many csv rows instead of one csv row. Because if we want to batch csv row into multiple rows, we use MergeContent processor, but you also mention that MergeContent is costly. So how batch processing will work on Nifi??
... View more
09-07-2016
11:00 PM
10 Kudos
One of the most highly anticipated features of Apache NiFi 1.0.0 is the introduction
of Zero-Master Clustering. Previous versions of NiFi relied upon a single "Master Node"
(more formally known as the NiFi Cluster Manager) to show the User Interface. If this
node was lost, data continued to flow, but the
application was unable to show the topology of the flow, or show any stats. Additionally,
Site-to-Site communications continued to send data but could not obtain up-to-date
information about cluster topology, which resulted in less efficient load balancing. Version 1.0.0 of NiFi addresses these issues by switching to a Zero-Master Clustering paradigm.
This post will explore the approaches taken to ensure that NiFi provides high availability
of the control plane without sacrificing the User Experience. After all, the User Experience is
what has allowed NiFi to become the go-to solution for providing dataflow management to small
organizations as well as the world's largest enterprises. The benefit that the master/worker paradigm offered us was a design that was easy to reason over
and understand. All web requests were sent directly to the master. This means that coordination of
the flow was controlled by the master (e.g., it would prevent one user from modifying a Processor while
another user was modifying the Processor at the same time). The entire cluster topology was
stored only at the master. The "golden copy" of the flow configuration was held by the master. To
the extent possible, we wanted to keep this benefit of being easy to reason about how the system
works, while still overcoming all of these hurdles. I am happy to say that the NiFi community has accomplished this goal, keeping a simple, easy-to-understand
design with all of the benefits of High Availability. To do this, we leveraged the power of Apache ZooKeeper in
order to provide automatic election of different clustering-related roles. In NiFi 1.0.0, we have two
different roles that are automatically elected. The first role is the Primary Node (Yes! Gone are the days
of having to manually switch which node is Primary Node). The second role is the Cluster Coordinator. This new Cluster Coordinator role is responsible for monitoring the nodes in a cluster and marking any
nodes that fail to heartbeat as being "Disconnected." Additionally, the Cluster Coordinator provides a
mechanism to ensure that the flow is consistent across all nodes. This is accomplished by forwarding all
web-based requests to the Coordinator. The Coordinator can then replicate this request to all nodes in
the cluster and merge their responses into a single, unified view, in much the same way that the old
Cluster Manager did. However, with the shift to the Cluster Coordinator, if the node that is elected
Cluster Coordinator drops from the cluster, a new node will automatically pick up these responsibilities.
This approach means that users are able to navigate to the URL of any node in a NiFi cluster, so users
need not concern themselves with which node is currently elected the Cluster Coordinator. All of the
necessary coordination, such as component locking, is handled at a single point, so there is no need to
introduce expensive and difficult-to-understand distributed locking mechanisms. Additionally, these changes provide a great footing to build upon for the upcoming changes that are planned
for Data Replication across nodes in a NiFi cluster. A NiFi Feature Proposal outlines this feature at
a fairly high level at https://cwiki.apache.org/confluence/display/NIFI/Data+Replication. This notion of
an automatically elected, highly available Cluster Coordinator means that we can also develop an
easy-to-understand approach for this Data Replication, as well, since we are able to elect a single
node to coordinate the failover of the data processing. Also new to NiFi 1.0.0 is an overhaul of the security model and component-level versioning. We refer
to these updates jointly as providing multi-tenancy. NiFi now supports any number of users viewing and
modifying the flow at the same time without the need to continually refresh the flow. In addition to this,
permissions can now be given to users to read or modify any component, individually. Prior to version 1.0.0,
NiFi required that users be given read-only access or write access to the entire flow. However, as NiFi
continues to gain more and more adoption, enterprise users have been seeking the ability to restrict access
to specific components to different users. This is now possible, with a simple, intuitive user interface to
provide and configure access policies. Bryan Bende, an Apache NiFi PMC member has provided an excellent overview
of this feature at http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy. Version 1.0.0 of Apache NiFi has been a long time in the making and is available now. In this post, we've given
a very high level overview of how the Zero-Master Clustering feature works. A completely redesigned UI and several
minor features and improvements have been added, as well. NiFi can be downloaded at
http://nifi.apache.org/download.html with Release Notes available at
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.0.0. Please let us know how we
can continue to improve the application and what you would love to see added into a future version!
... View more
Labels: