Member since
01-11-2016
355
Posts
230
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5466 | 06-19-2018 08:52 AM | |
1748 | 06-13-2018 07:54 AM | |
2022 | 06-02-2018 06:27 PM | |
1809 | 05-01-2018 12:28 PM | |
3139 | 04-24-2018 11:38 AM |
08-27-2018
03:13 PM
1 Kudo
@Steven Matison are you referring to the components version in each platform? if yes, this is still available in the release notes like in previous version. https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/release-notes/content/comp_versions.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.2.0/release-notes/content/component_support.html The support matrix is additional and make checking platforms compatibility easier.
... View more
08-27-2018
03:06 PM
1 Kudo
Hi @phil gib SMM is available in DPS 1.2 and can be used with Kafka in HDF 3.2 or HDP 3.0. You can check the new support matrix here https://supportmatrix.hortonworks.com/
... View more
08-10-2018
04:20 PM
Hi @yazeed salem NiFi has a processor called QueryRecord that can be used to do SQL on data coming from Kafka : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.7.1/org.apache.nifi.processors.standard.QueryRecord/index.html This is not KSQL. Thanks Abdelkrim
... View more
08-10-2018
04:19 PM
Hi @yazeed salem NiFi has a processor call QueryRecord that can be used to do SQL on data coming from Kafka : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.7.1/org.apache.nifi.processors.standard.QueryRecord/index.html This is not KSQL. Thanks Abdelkrim
... View more
06-19-2018
08:57 AM
Yes sorry I submitted before finishing my answer
... View more
06-19-2018
08:52 AM
Hi @Vivek Singh This has been answered by @Matt Burgess recently : https://community.hortonworks.com/questions/193888/nifi-is-it-possible-to-access-processor-group-vari.html This is mainly on how to access. To update the variable, you should use the API. I don't think there's a direct way from the script. Maybe take the output value as a flow file after the ExecuteScript and use another processor to call the API and update the value
... View more
06-18-2018
09:56 AM
2 Kudos
Hi @rajat puchnanda You can select your process group, click on save as template in the left menu. After that, go to Hamburger menu, template and save. This will download an XML file that describe the process group and you can import it in another NiFi.
... View more
06-16-2018
06:34 PM
Hi @Abhinav Yepuri There are several ways to automate this. One of these is using NiFi CLI available from NiFi 1.6 : https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli You have nifi pg-get-vars and nifi pg-set-var that you can use to get variables from dev, replace values with a dictionary and set in prod.
... View more
06-13-2018
07:54 AM
1 Kudo
Hi @John T When you use GetSFTP in a cluster you are duplicating your data. Each node will ingest the same data. You need to use List/Fetch pattern. A great description of this feature is available here : https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/ Now if you used the List/Fetch pattern correctly and don't have even data distribution, you need to understand that Site-to-Site protocol does batching to have better network performance. This means that if you have 3 flow files of few KB or MB to send, NiFi decides to send them to one node rather than using 3 connection. The decision is take based on data size, number of flow files and transmission duration. Because of this, you don't get data distributed when you are doing tests. Usually you test with few small files. The batching threshold is by default but you can change it for each input port. Go to RPG, Input ports then click on the edit pen for your input port and you get this settings I hope this helps understand the behavior. Thanks
... View more
06-07-2018
07:08 PM
1 Kudo
@Bhushan Kandalkar Here a step by step doc : https://community.hortonworks.com/articles/886/securing-nifi-step-by-step.html And this the official doc : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_security/content/enabling-ssl-without-ca.html
... View more
06-07-2018
02:02 PM
What about proxy ? as you can see in the provided link To allow users to view the NiFi UI, create the following policies for each host:
/flow – read /proxy – read/write
... View more
06-07-2018
08:15 AM
Hi @Bhushan Kandalkar Have you added Ranger policies to let users see the UI : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_security/content/policies-to-view-nifi.html ? Thanks
... View more
06-04-2018
05:56 PM
Hi @tthomas You can use EvaluateJsonPath to extract a JSON field and add it as a flow file attribute : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.EvaluateJsonPath/index.html If your JSON is the following and you want to add a flow file attribute called timestamp {
“created_at” : “Thu Sep 28 08:08:09 CEST 2017”,
“id_store” : 4,
“event_type” : “store capacity”,
“id_transaction” : “1009331737896598289”,
“id_product” : 889,
“value_product” : 45
} you can add an EvaluateJsonPath and add an attribute timestamp with the value $.created_at
... View more
06-02-2018
06:27 PM
1 Kudo
Hi @Rahul Kumar Beyond the fact that they are both called "pub sub brokers", Kafka and MQTT has different design goal. Without going deep into details, it's better to see MQTT as a communication protocol between several applications. It was designed to be extremely low light to fit into IoT and resource-constrained environment. For this, the objective is to distribute messages between different system and not to store large volume of data for long time. At the other hand, Kafka is broker that can store large volume of data and for long time (or for ever). It was designed to be scalable and provide the best performances. Hence, a Kafka cluster usually use beefy machines. It's well suited for Big Data application and has integration with the big data ecosystem (Spark, Storm, Flink, NiFi, etc). Depending on your application requirements the choice is usually easy to make. In lot of scenarios it's Kafka and MQTT. For IoT for instance, it's not rare to see MQTT at local level (gateway for example) for sensors/actuators communications, and Kafka at regional/center level for data ingestion, processing and storage. Technically, there are lot of difference too in termes of quality of service, streaming semantics, internal architecture, etc I hope this helps clarifies your mind.
... View more
06-02-2018
06:05 PM
Hi @Pankaj Singh You can use NiFi directly to pull the file and store them in your data lake (I assume you mean HDFS). You have list/fetch files processor and PutHDFS processor to do so. S2S can be used to distribute the load on the NiFi cluster. Thanks
... View more
06-02-2018
05:55 PM
1 Kudo
Hi @Kiran M K HDF is like HDP, 100% Open Source platform based on Apache techs. You can use HDF and schema registry for free. You will the exact same product with full features as a paying user. If you need support, expertise and a partner to help you on your Big Data journey using HDP and HDF, you can subscribe to the support. Depending on your contexte, you can start using HDF with these scenarios : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_planning-your-deployment/content/ch_deployment-scenarios.html I hope this helps
... View more
05-31-2018
10:36 PM
Hi @Jason Bolden This is not possible currently. Access will be done with the user running NiFi. This is possible for other processors like GetHDFS where you can do user impersonation. Thanks
... View more
05-04-2018
04:11 PM
Hi @ranya triki If you need Kafka 1.0 in HDP you will need to wait for HDP 2.6.5 coming in few weeks
... View more
05-04-2018
04:07 PM
1 Kudo
Hi @Josh Nicholson For this you can use a control rate processor and set a flow file expire duration for the its input relation. Control rate will let only one message go let say per 10 minutes. The other flow files will stay in the queue and get expired and deleted after 10 minutes. Thanks Abdelkrim
... View more
05-01-2018
08:35 PM
4 Kudos
DataWorks Summit (DWS) is the industry’s Premier Big Data Community Event in Europe and the US. The last DWS was in Berlin, Germany, on April 18th and 19th. This was the 6th year occurence in Europe and this year there was over 1200 attendees from 51 different countries, 77 breakouts in 8 tracks, 8 Birds-of-a-Feather sessions and 7 Meetups. I had the opportunity to attend as a speaker this year, where I gave a talk on “Best practices and lessons learnt from Running Apache NiFi”. It was a joint talk with the Big Data squad team from Renault, a French car manufacturer. The presentation recording will be available on the DWS website. In the meantime, I’ll share with you the 3 key takeaways from our talk. NiFi is an accelerator for your Big Data projects If you worked on any data project, you already know how hard it is to get data into your platform to start “the real work”. This is particularly important in Big Data projects where companies aim to ingest a variety of data sources ranging from Databases, to files, to IoT data. Having NiFi as a single ingestion platform that gives you out-of-the-box tools to ingest several data sources in a secure and governed manner is a real differentiator. NiFi accelerates data availability in the data lake, and hence accelerates your Big Data projects and business value extraction. The following numbers from Renault projects are worth a thousands words. NiFi enables new use cases NiFi is not only an ingestion tool. It’s a data logistics platform. This means that NiFi enables easy collection, curation, analysis and action on any data anywhere (edge, cloud, data center) with built-in end-to-end security and provenance. This unique set of features makes NiFi the best choice for implementing new data centric use cases that require geographically distributed architectures and high levels of SLA (availability, security and performance). In our talk, two exciting use cases were shared: connected plants and packaging traceability. NiFi flow design is like software development When I pitch NiFi to my customers I can see them get excited quickly. They start brainstorming instantly and ask if NiFi can do this or that. In this situation, I usually fire a NiFi instance on my MAC and start dragging and dropping a few processors in NiFi to simulate their use case. This is a powerful feature that fosters interactions between team members in the room and gets us to very interesting business and technical discussions. When people see the power of NiFi and all what we can easily achieve in short a timeframe, a new set of questions arise (especially from the very few skeptics in the room :)). Can I automate this task? Can I monitor my data flows? Can I integrate NiFi flow design with my development process? Can I “industrialize” my use case?. All these questions are legitimate when we see how powerful and easy to use NiFi is. The good news is that “Yes” is the answer to all previous questions. However, it’s important to put in place the right process to avoid having a POC that becomes a production (who has never lived this situation?)
The way I like to answer these questions is to show how much NiFi flow design is like software development. When a developer wants to tackle a problem, he starts designing a solution by asking : ‘what’s the best way to implement this?’. The word best here integrates aspects like complexity, scalability, maintainability, etc. The same logic applies to NiFi flow design. You have several ways to implement your use case and they are not equivalent. Once a solution is found, you will use NiFi UI as your IDE to implement the solution. Your flow is a set of processors just like your code or your algorithm is a set of instructions. You have “if then else” statements with routing processor, you have “for” or “while” loops with update attributes and self-relations, you have mathematical and logical operators with processors and Expression Langage, etc. When you build your flow you divide it into process groups similar to functions you use when you organize your code. This makes your applications easier to understand, to maintain, and to debug. You use templates for repetitive things like you build and use libraries across your projects. From this main consideration, you can derive several best practices. Some of them are generic software development practices, and some of them are specific to NiFi as “a programming language”. I share some good principals to use in this following slide: Final thoughts NiFi is a powerful tool that gives you business and technical agility. To master its power, it is important to define and to enforce best practices. Lots of these best practices can be borrowed directly from software engineering. Others are specific to NiFi. We have shared some of these ideas in deck available on the DWS webpage. Some of the ideas explained in the presentation have been discussed by other NiFi enthusiasts such as the excellent “Monitoring NiFi Series” by Pierre[1]. Various Flow Development Lifecycle (FDLC) [2] topics have been also covered by folks like Dan and Tim for NiPyAPI[3][4], Bryan for flow registry [5] and Pierre for NiFi CLI [6]. Other topics like NiFi design patterns requires a dedicated post that I’ll address in the future. Article initially shared on https://medium.com/@abdelkrim.hadjidj/best-practices-for-using-apache-nifi-in-real-world-projects-3-takeaways-1fe6912101db
... View more
- Find more articles tagged with:
- best-practices
- Data Ingestion & Streaming
- How-ToTutorial
- NiFi
- production
Labels:
05-01-2018
03:52 PM
Can you give an example please? I can not see where the problem is
... View more
05-01-2018
12:28 PM
Hi @Chad Shaw How often your whitelist is updated? if not often, you can use NiFi to ingest it from HDFS, to store it in NiFi local server and use it with ScanAttribute.
... View more
04-24-2018
11:38 AM
Hi @aman
mittal
NiFi has it's own scheduler and provides processor level scheduling. NiFi is a flow management tools that was designed to ingest data as fast and efficient as possible. The scheduling is done in NiFi as a complete platform and at a granular level. I don't think that using an external scheduler as a general solution for NiFi flows scheduling is a good approach. However, this can make sense in some scenarios. Here are few tips: To implement these edges scenarios you can add the scheduling logic to your flow and trigger it from an external system through an HTTP call for instance (adding ListenHTTP or any other processor), a Kafka event or any other mechanism. In some cases, this won't be possible. I think about flows where your first processor doesn't accept incoming relationships. I am sure that there are other scenarios where this won't work too. Also, something to look to is Wait/Notify. You can use them as a gate that you open upon receiving an external event (ex. http call from a scheduler). Here you will have two flows : one for data ingest with a blocking step with wait, and one for deblocking the flow with notify. If you have varying number of flow files to block/release, this solution can be complicated.
... View more
04-23-2018
01:21 PM
2 Kudos
Hi @vivek jain There's no undo feature currently in NiFi. Event if it looks like a simple feature, having an undo button in a realtime data flow platform is not an easy task to design/implement.
... View more
04-22-2018
03:44 PM
Hi @Dan Alan You can use Site to Site reporting task to export NiFi metrics from one NiFi to another. You can set a SiteToSiteBulletinReportingTask and SiteToSiteStatusReportingTask at each edge NiFi and make them send data to a central NiFi. On the central NiFi, you receive metrics as Json file in central NiFi and you can use any NiFi processor to analyse these events and extract the information that you need for your monitoring. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-site-to-site-reporting-nar/1.6.0/org.apache.nifi.reporting.SiteToSiteBulletinReportingTask/index.html https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-site-to-site-reporting-nar/1.6.0/org.apache.nifi.reporting.SiteToSiteStatusReportingTask/index.html You can also use S2S Provenance RT is this is of interest Is this what you are looking for ?
... View more
04-22-2018
03:40 PM
Hi @Suhas
Fox
I am not aware of any processor to write CEF. There's a ParseCEF but it's not what you are looking for. Since your data is json, you can try to write your CEF event using the json indexation like this "$.header_version | $.header_deviceVendor | etc" This assumes that your JSON has fields named header_version, header_deviceVendor, etc
... View more
04-22-2018
03:22 PM
Hi @Quan
Zhang
Hortonworks has two principal platforms HDP and HDF. NiFi is part of HDF which is designed for data in motion. HDP contains technologies for managing data at rest. NiFi is added in HDP Sandbox only for demo purposes to have all services in one VM. Please take a look at HDP and HDF components to have more informations: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_release-notes/content/comp_versions.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_release-notes/content/ch_hdf_relnotes.html
... View more
04-18-2018
09:59 AM
Hi @adrian white, Have you considered put file processor ?
... View more