About ahadjidj

ahadjidj · ‎05-31-2018

Hi @Jason Bolden This is not possible currently. Access will be done with the user running NiFi. This is possible for other processors like GetHDFS where you can do user impersonation. Thanks

ahadjidj · ‎05-01-2018

@Chad Shaw glad that you find the article useful 🙂

ahadjidj · ‎05-01-2018

DataWorks Summit (DWS) is the industry’s Premier Big Data Community Event in Europe and the US. The last DWS was in Berlin, Germany, on April 18th and 19th. This was the 6th year occurence in Europe and this year there was over 1200 attendees from 51 different countries, 77 breakouts in 8 tracks, 8 Birds-of-a-Feather sessions and 7 Meetups. I had the opportunity to attend as a speaker this year, where I gave a talk on “Best practices and lessons learnt from Running Apache NiFi”. It was a joint talk with the Big Data squad team from Renault, a French car manufacturer. The presentation recording will be available on the DWS website. In the meantime, I’ll share with you the 3 key takeaways from our talk. NiFi is an accelerator for your Big Data projects If you worked on any data project, you already know how hard it is to get data into your platform to start “the real work”. This is particularly important in Big Data projects where companies aim to ingest a variety of data sources ranging from Databases, to files, to IoT data. Having NiFi as a single ingestion platform that gives you out-of-the-box tools to ingest several data sources in a secure and governed manner is a real differentiator. NiFi accelerates data availability in the data lake, and hence accelerates your Big Data projects and business value extraction. The following numbers from Renault projects are worth a thousands words. NiFi enables new use cases NiFi is not only an ingestion tool. It’s a data logistics platform. This means that NiFi enables easy collection, curation, analysis and action on any data anywhere (edge, cloud, data center) with built-in end-to-end security and provenance. This unique set of features makes NiFi the best choice for implementing new data centric use cases that require geographically distributed architectures and high levels of SLA (availability, security and performance). In our talk, two exciting use cases were shared: connected plants and packaging traceability. NiFi flow design is like software development When I pitch NiFi to my customers I can see them get excited quickly. They start brainstorming instantly and ask if NiFi can do this or that. In this situation, I usually fire a NiFi instance on my MAC and start dragging and dropping a few processors in NiFi to simulate their use case. This is a powerful feature that fosters interactions between team members in the room and gets us to very interesting business and technical discussions. When people see the power of NiFi and all what we can easily achieve in short a timeframe, a new set of questions arise (especially from the very few skeptics in the room :)). Can I automate this task? Can I monitor my data flows? Can I integrate NiFi flow design with my development process? Can I “industrialize” my use case?. All these questions are legitimate when we see how powerful and easy to use NiFi is. The good news is that “Yes” is the answer to all previous questions. However, it’s important to put in place the right process to avoid having a POC that becomes a production (who has never lived this situation?) The way I like to answer these questions is to show how much NiFi flow design is like software development. When a developer wants to tackle a problem, he starts designing a solution by asking : ‘what’s the best way to implement this?’. The word best here integrates aspects like complexity, scalability, maintainability, etc. The same logic applies to NiFi flow design. You have several ways to implement your use case and they are not equivalent. Once a solution is found, you will use NiFi UI as your IDE to implement the solution. Your flow is a set of processors just like your code or your algorithm is a set of instructions. You have “if then else” statements with routing processor, you have “for” or “while” loops with update attributes and self-relations, you have mathematical and logical operators with processors and Expression Langage, etc. When you build your flow you divide it into process groups similar to functions you use when you organize your code. This makes your applications easier to understand, to maintain, and to debug. You use templates for repetitive things like you build and use libraries across your projects. From this main consideration, you can derive several best practices. Some of them are generic software development practices, and some of them are specific to NiFi as “a programming language”. I share some good principals to use in this following slide: Final thoughts NiFi is a powerful tool that gives you business and technical agility. To master its power, it is important to define and to enforce best practices. Lots of these best practices can be borrowed directly from software engineering. Others are specific to NiFi. We have shared some of these ideas in deck available on the DWS webpage. Some of the ideas explained in the presentation have been discussed by other NiFi enthusiasts such as the excellent “Monitoring NiFi Series” by Pierre[1]. Various Flow Development Lifecycle (FDLC) [2] topics have been also covered by folks like Dan and Tim for NiPyAPI[3][4], Bryan for flow registry [5] and Pierre for NiFi CLI [6]. Other topics like NiFi design patterns requires a dedicated post that I’ll address in the future. Article initially shared on https://medium.com/@abdelkrim.hadjidj/best-practices-for-using-apache-nifi-in-real-world-projects-3-takeaways-1fe6912101db

ahadjidj · ‎05-01-2018

Can you give an example please? I can not see where the problem is

ahadjidj · ‎05-01-2018

Hi @Chad Shaw How often your whitelist is updated? if not often, you can use NiFi to ingest it from HDFS, to store it in NiFi local server and use it with ScanAttribute.

ahadjidj · ‎04-24-2018

Hi @aman mittal NiFi has it's own scheduler and provides processor level scheduling. NiFi is a flow management tools that was designed to ingest data as fast and efficient as possible. The scheduling is done in NiFi as a complete platform and at a granular level. I don't think that using an external scheduler as a general solution for NiFi flows scheduling is a good approach. However, this can make sense in some scenarios. Here are few tips: To implement these edges scenarios you can add the scheduling logic to your flow and trigger it from an external system through an HTTP call for instance (adding ListenHTTP or any other processor), a Kafka event or any other mechanism. In some cases, this won't be possible. I think about flows where your first processor doesn't accept incoming relationships. I am sure that there are other scenarios where this won't work too. Also, something to look to is Wait/Notify. You can use them as a gate that you open upon receiving an external event (ex. http call from a scheduler). Here you will have two flows : one for data ingest with a blocking step with wait, and one for deblocking the flow with notify. If you have varying number of flow files to block/release, this solution can be complicated.

ahadjidj · ‎04-23-2018

Hi @vivek jain There's no undo feature currently in NiFi. Event if it looks like a simple feature, having an undo button in a realtime data flow platform is not an easy task to design/implement.

ahadjidj · ‎04-16-2018

Something you can do is to trigger a first Notify manually and let the gate do the rest. For instance, you can use a Generateflowfile with a notify for initial triggering and stop it afterward. This works but I don't know if there's a better way to do it. If the idea is to control flows, take a look at control rate processor which can be helpful.

ahadjidj · ‎04-16-2018

Hi @Laurie McIntosh This is expected. Data Flows in your example are blocked at the Wait processor hence there's no flowfile going through the Putfile and then Notify to unblock the file blocked in the wait. Notify is never triggered here. You need to have your Notify in an independent flow with the triggering logic. Thanks

ahadjidj · ‎03-29-2018

As you can see in my second screenshot, a template is attached to a process group. This is the scope of the template. In this case, a template is a resource attached to your process group. A process group can not be deleted until all its attached resources are deleted.

Online	Offline
Last Visited	‎08-19-2019 05:07 AM

Member Since	‎01-11-2016 06:11 PM
Last Visited	‎08-19-2019 05:07 AM
Posts	355
Kudos received	232

Cloudera Community

Re: How to access NIFI Process Group variable in E...

Re: GETSFTP with NiFi cluster

Re: how is Kafka different from Mosquitto(MQTT) ?

Re: Whitelisting using LookupAttribute

Re: Is there any ways if we can schedule or trigge...

Re: Nifi GetFile (os operation) user impersonation

Re: Whitelisting using LookupAttribute

Best practices for using Apache NiFi in real world...

Re: Whitelisting using LookupAttribute

Re: Whitelisting using LookupAttribute

Re: Is there any ways if we can schedule or trigge...

Re: How to undo in nifi ?

Re: Simple Nifi 1.6.0 Wait/Notify example

Re: Simple Nifi 1.6.0 Wait/Notify example

Re: Can't delete process group - "because it conta...