Member since
01-11-2016
355
Posts
230
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7952 | 06-19-2018 08:52 AM | |
2974 | 06-13-2018 07:54 AM | |
3418 | 06-02-2018 06:27 PM | |
3614 | 05-01-2018 12:28 PM | |
5106 | 04-24-2018 11:38 AM |
06-19-2018
08:57 AM
Yes sorry I submitted before finishing my answer
... View more
06-19-2018
08:52 AM
Hi @Vivek Singh This has been answered by @Matt Burgess recently : https://community.hortonworks.com/questions/193888/nifi-is-it-possible-to-access-processor-group-vari.html This is mainly on how to access. To update the variable, you should use the API. I don't think there's a direct way from the script. Maybe take the output value as a flow file after the ExecuteScript and use another processor to call the API and update the value
... View more
06-18-2018
09:56 AM
2 Kudos
Hi @rajat puchnanda You can select your process group, click on save as template in the left menu. After that, go to Hamburger menu, template and save. This will download an XML file that describe the process group and you can import it in another NiFi.
... View more
06-16-2018
06:34 PM
Hi @Abhinav Yepuri There are several ways to automate this. One of these is using NiFi CLI available from NiFi 1.6 : https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli You have nifi pg-get-vars and nifi pg-set-var that you can use to get variables from dev, replace values with a dictionary and set in prod.
... View more
06-13-2018
07:54 AM
1 Kudo
Hi @John T When you use GetSFTP in a cluster you are duplicating your data. Each node will ingest the same data. You need to use List/Fetch pattern. A great description of this feature is available here : https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/ Now if you used the List/Fetch pattern correctly and don't have even data distribution, you need to understand that Site-to-Site protocol does batching to have better network performance. This means that if you have 3 flow files of few KB or MB to send, NiFi decides to send them to one node rather than using 3 connection. The decision is take based on data size, number of flow files and transmission duration. Because of this, you don't get data distributed when you are doing tests. Usually you test with few small files. The batching threshold is by default but you can change it for each input port. Go to RPG, Input ports then click on the edit pen for your input port and you get this settings I hope this helps understand the behavior. Thanks
... View more
06-07-2018
07:08 PM
1 Kudo
@Bhushan Kandalkar Here a step by step doc : https://community.hortonworks.com/articles/886/securing-nifi-step-by-step.html And this the official doc : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_security/content/enabling-ssl-without-ca.html
... View more
06-07-2018
02:02 PM
What about proxy ? as you can see in the provided link To allow users to view the NiFi UI, create the following policies for each host:
/flow – read /proxy – read/write
... View more
06-07-2018
08:15 AM
Hi @Bhushan Kandalkar Have you added Ranger policies to let users see the UI : https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_security/content/policies-to-view-nifi.html ? Thanks
... View more
06-04-2018
05:56 PM
Hi @tthomas You can use EvaluateJsonPath to extract a JSON field and add it as a flow file attribute : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.EvaluateJsonPath/index.html If your JSON is the following and you want to add a flow file attribute called timestamp {
“created_at” : “Thu Sep 28 08:08:09 CEST 2017”,
“id_store” : 4,
“event_type” : “store capacity”,
“id_transaction” : “1009331737896598289”,
“id_product” : 889,
“value_product” : 45
} you can add an EvaluateJsonPath and add an attribute timestamp with the value $.created_at
... View more
06-02-2018
06:27 PM
1 Kudo
Hi @Rahul Kumar Beyond the fact that they are both called "pub sub brokers", Kafka and MQTT has different design goal. Without going deep into details, it's better to see MQTT as a communication protocol between several applications. It was designed to be extremely low light to fit into IoT and resource-constrained environment. For this, the objective is to distribute messages between different system and not to store large volume of data for long time. At the other hand, Kafka is broker that can store large volume of data and for long time (or for ever). It was designed to be scalable and provide the best performances. Hence, a Kafka cluster usually use beefy machines. It's well suited for Big Data application and has integration with the big data ecosystem (Spark, Storm, Flink, NiFi, etc). Depending on your application requirements the choice is usually easy to make. In lot of scenarios it's Kafka and MQTT. For IoT for instance, it's not rare to see MQTT at local level (gateway for example) for sensors/actuators communications, and Kafka at regional/center level for data ingestion, processing and storage. Technically, there are lot of difference too in termes of quality of service, streaming semantics, internal architecture, etc I hope this helps clarifies your mind.
... View more