Member since
01-11-2016
355
Posts
230
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8283 | 06-19-2018 08:52 AM | |
3211 | 06-13-2018 07:54 AM | |
3660 | 06-02-2018 06:27 PM | |
3959 | 05-01-2018 12:28 PM | |
5496 | 04-24-2018 11:38 AM |
10-25-2017
07:23 PM
Hi @Charles Bradbury Spark 2.2 is not available in HDP so you won't be able to upgrade to this version using Ambari. You can manually install Spark 2.2 on the cluster but it wasn't tested and is not certified/supported by Hortonworks yet. Thanks
... View more
10-25-2017
07:35 AM
Hi @Saikrishna Tarapareddy Indeed the community is working on a flow registry that should be available in coming releases of HDF/NiFi. I don't have a date for this yet, but I can see it available in the one of the next two releases. For SDLC, you can integrate Git with flow.xml or with templates. IMO, using template is easier since you need to manage only a part of the flow. Since there are several flows running in NiFi, it's easier to deploy a template than the complete flow. This reduce impact on the production. You can leverage NiFi API to implement your automated deployments. Note that there are few technical challenges and it's not a lift/shift operation of templates. For instance, all passwords are protected when you download a template. You need to replacer them with the right value before deploying on the next environment. Another example is variables. Let's say you NiFi write data in Kafka, you need to change the Kafka broker address from env to another one. You can use custom variables with nifi.variable.registry.properties but this requires NiFi restart which is not acceptable for production. There's also a work on this topic : https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry I hope these details help you. Thanks
... View more
10-24-2017
07:54 PM
Hi @Balakrishna Dhanekula I am glad that the answer was helpful. Please take a moment to click "Accept"
the answer for future references. Thanks
... View more
10-24-2017
04:51 AM
1 Kudo
Hi @Andre Labbe You can configure log rotation and retention in the logback configuration of NiFi. Please start by reading these ressources to get started: https://pierrevillard.com/2017/05/12/monitoring-nifi-logback-configuration/ http://apache-nifi.1125220.n5.nabble.com/NiFi-app-log-td7437.html
... View more
10-24-2017
04:44 AM
Hi @dhieru singh You need to do two things: First, you need a good capacity planing to evaluate the required infrastructure that can handle your data flows. Consider the worst case scenario to have room for improvement and the capacity to manage bursts. There are several resources out there that can help you https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_command-line-installation/content/hdf_isg_hardware.html Second, as you said, you need to monitor your system at different level. Pierre has a set of articles on this topic that I recommend you to read : https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/
... View more
10-24-2017
04:37 AM
@Sri Bet
NiFi was designed to move data from one place to another, not to store it. NiFi stores data in content repository temporarily for the processing but has routines to delete flow files automatically. Data is deleted just after the end of the flow or after an archive retention period which 12 hours by default. This article explains the archiving process : https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html As you can see, NiFi is designed to delete data that's not anymore used. The idea behind is that NiFi moved it to a storage location. You should use storage solution for storing data not NiFi. For instance, why don't you use your FTP server for this?
... View more
10-23-2017
07:05 PM
Hi @dhieru singh When you add a processor to a NiFi cluster it's deployed on each node but enabled following the two cases:
If you set scheduling to primary node, the processor is actif only in the primary node. If the primary node is down, NiFi will chose a new node as a primary node and the processor is activated on this new node. If you set scheduling to all nodes, the processor is enabled on all cluster's nodes. The ListFile processor lists files local to a NiFi node. So if you use it with primary only scheduling then only primary node lists the directory, and continue to work on generated files. If you use it with all nodes scheduling, each NiFi node list its local files and continue to work on them locally. If you need to distribute files between node then you need to use S2S with remote process group. You need to understand this and your use case and plan accordingly to avoid data duplication and data loss. I hope this is helpful. Thanks
... View more
10-23-2017
05:05 PM
2 Kudos
Hi @dhieru singh You can configure the scheduling of the processor to define how data is generated (Run Schedule) Is this what you are looking for ?
... View more
10-20-2017
02:48 PM
1 Kudo
Hi @Raj B This is the expected behavior. The controller services that you add from the Hamburger menu (top right) are used only for reporting tasks. If you want to add controller services for processor you should add them from the configure menu of your process group or root canvas. This article explains the difference : https://community.hortonworks.com/articles/90259/understanding-controller-service-availability-in-a.html
... View more
10-20-2017
01:39 PM
1 Kudo
Hi @Patrick Maggiulli Glad that the answer was useful. Please accept the answer to close this thread. Thanks
... View more