About MattWho

MattWho · ‎10-17-2018

yes, that makes sense. When you start the DistirbutedMapCache server is starts a server on each NiFi node. The DIstributedMapCache Client should be configured to point at one specific node, so that every node pulls cache entries from same server. - A little back history: The DistibrutedMapCacheServer and DistirbutedMapCacheClient controller services date back to original NiFi releases versions. Back in those days there was no zero master clustering which we have now. There was a dedicated server that ran a NiFi Cluster Manager (NCM). At that time the DistributedMapCacheServer could only be setup on the NCM. - Once NiFi moved away from having a NCM, the functionality of these controller services was not changed to avoid breaking flows of user who moved to latest versions. The DistirbutedMapCacheServer does not offer HA (if node hosting server goes down, cache becomes unavailable). To provide HA here, new external HA caches options have been added as options. - thanks, Matt

MattWho · ‎10-17-2018

@pavan srikar I should add that there is no processor that will specifically clone a FlowFile to every node in the NiFi cluster. - But there are other options if you do not want to standup an external map cache server. - Perhaps setting up a disk mount that is shared across all nodes. On Primary node only you run a flow that retrieves a new token every ~55 minutes writes it to this shared mounted directory set to overwrite previous written token each time. Then on all nodes you could create a flow that consumes this token without deleting it on schedule to perform your all node tasks. - Just a second option for you. - Thank you, Matt

MattWho · ‎10-17-2018

@pavan srikar - The design you have in place looks to be correct solution based on your described use case here. Every node in your cluster runs the exact same flow.xml.gz - You would typically configure your "PutDistributedMapCache" and "FetchDistributedMapCache" processors to use a "Distributed Cache Service" that every node has access to. - This allows you run a single "primary node" only flow that retrieves the token based on a one hour cron and writes it to the distributed Map cache and then have a second flow that every node runs that pulls that stored token value from the distributed map cache and uses it for your downstream calls. - Using the "RedisDistributedMapCacheClientService" controller service for example allows you to set a TTL on the values you store in the cache. This allows you to expire the stored token before it is no longer valid. For example token is good for 1 hour, so you could set TTL to 50 - 55 minutes. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-15-2018

@Stephen Greszczyszyn - NiFi is designed to be data agnostic meaning it has no dependency on any specific type(s) of data. This is accomplished by wrapping ingested content in a NiFi FlowFile. A NiFi FlowFile consists of two parts: - 1. FlowFile content (This is the bytes of data which are simply written to claims in the content repository) 2. FlowFile attributes/metatada (This is information about the FlowFile and its content) - While NiFi does not have and dependency on data types, various processors that are available in NiFi likely will. So you will need to take a closer look at the documentation for any processor you use that will need to interact with the FlowFile content. NiFi has some syslog based processors already. - When it comes to writing the raw data NiFi simply transmits the bytes. If the target will accept the raw data, then all is good. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-15-2018

@Amit Mishra - Based on the ERROR above it appears someone has added a non standard jar file to NiFi's default lib directory: phoenix-4.7.0.2.5.3.0-37-client.jar . --> added to :/usr/hdf/2.1.1.0-2/nifi/lib/ directory - First thing I would try is removing this file from NiFi's default lib directory on every NiFi node. Then restart your NIFi cluster to make sure ERROR goes away. - Then if you find you need this jar for something in your dataflow, perhaps try creating a new custom lib directory in NiFi and adding it there. This done by simply adding a new property in the nifi.properties file: - 1. Create a new custom lib directory on each of the NiFi nodes (for example: /var/lib/nifi/custom-lib/) 2. Move your custom phoenix-4.7.0.2.5.3.0-37-client.jar in to that new directory. (Recommend moving any other custom added jar/nar files here as well. You should not be adding any non standard files to NiFI's default lib directory.) 3. Make sure property directory and file ownership and permissions are set. 4. Add new custom property named: nifi.nar.library.directory.custom1= . (custom1 is an example and can be set to whatever you like.) 5. Set this new properties value to teh path to your custom lib directory you created on each node. (for example: nifi.nar.library.directory.custom1=/var/lib/nifi/custom-lib/ ) 6. Restart all your NiFi nodes. 7. Verify ERROR still does not appear. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-12-2018

@David Sargrad - The link example you provided in your comment is trying to deal with a zip that contains zipped files (a zip of zips). If you are talking about a single zip that contains a directory tree with subfiles, this is relatively easy to do. - After ingesting your zip file via GetHTTP feed it to an "UnpackContent" processor and then to a "PutFile" processor. - When the "UnpackContent" processor unzips the source file, it will create a new FlowFile for each unique file found. A variety of FlowFile attributes will be set on each of those generated FlowFiles. This includes the "path" In the above example I created a directory named "zip-root" and created 4 sub-directories within that zip-root directory. I then created one file in each of those subdirectories. I then zipped (zip -r zip-root.zip zip-root) up the zip-root directory named zip-root.zip. The above screenshots shows just one of those unpacked files. - After "UnpackContent" executed, it produced 4 new FlowFile (one for each file found in those sub-directories with in the zip). - The "path" FlowFile attribute on each of these generated FlowFiles can be used to maintain the original directory structure when writing out the FlowFiles vi "PutFile" as follows: You can see form above configuration that as each FlowFile is processed by the PutFile processor it will place in a directory based on the value assigned to the "path" attribute set on each incoming FlowFile. Here i decide that my target base directory should be /tmp/target/ and then I preserve/generate the original zipped files directory beneath there. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-12-2018

@David Sargrad - NiFi is designed to prevent data loss. This means that NiFi needs to do something with NiFi FlowFiles when the processing of that FlowFile encounters a failure somewhere within a dataflow. - When in comes to ingest type processors like GetHTTP, a FlowFile is only generated upon success. As such, there is no FlowFile created during failure that would need to be handled/routed to some failure relationship. - Upon next scheduled run, the getHTTP processor will simply try to execute just like it did on previous run. If successful, a FlowFile will be produced and routed to the outbound success relationship connection. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-11-2018

@David Sargrad - Would potentially be a waste of resources to use separate NiFi instances fro each of your dataflows. A single instance also provide no HA at all. A better approach is to setup a NiFi cluster where you run multiple dataflows. To help keep your dataflows organized, user typically make use of a "Process group" for each unique dataflow. On the root canvas you simply have a Process group for each unique dataflow you create to keep the UI clean and manageable. This type of setup would also allow you to easily "version control" each of these process groups independently to a NiFi registry. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-10-2018

@Alex Coast The 5 minute retention of Bulletins is a hard coded value that cannot be edited by the end user. It is normal to see the occasional bulletin from some NiFi processors. For example a failed putSFTP because of filename conflict or network issues, but on retry it is successful. A continuous problem would result in non stop bulletins being produced which would be easily noticed. - Take a look at my response further up on using the "SiteToSiteBulletinReportingTask" if you are looking to retain bulletins info longer, manipulate, route, store that somewhere, etc.. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎10-10-2018

@spdvnz My suggestion here would be to handle this via the "SiteToSiteBulletinReportingTask". - You can build a dataflow to receive these bulletin events, manipulate them as you want and store them in a location of your choice for your auditing needs. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

Online	Offline
Last Visited	‎12-22-2025 03:09 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-22-2025 03:09 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...

Re: Broadcast a Flowfile from primary node to all ...

Re: Can NiFi route raw packets - like UDP

Re: Not able to start NiFi cluster

Re: How to define a NIFI processor that will unzip...

Re: Why does the NIFI GetHTTP processor only have ...

Re: NIFI Architectural Approach - Independent Flow...

Re: NiFi - Capture error message in Bulletin

Re: NiFi - Capture error message in Bulletin