About MattWho

valentintyhonov · ‎03-25-2020

@MattWho Is HDF 3.5 already released? If not, do you know when it is planned to be released? I saw page with release notes, but repository locations are still not updated. https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.5.0/release-notes/content/hdf_repository_locations.html

domR · ‎03-24-2020

Thanks for clearing this up Matt, was a big help. Cheers, Dom

Alexandros · ‎03-24-2020

Ok, this is all me for not understanding permissions correctly, I thought that if a permission wasn't configured it heredates the the permissions of NiFi. So: I'm Admin, I generated a group I should have access. You made me check again and I thank you for that!

MattWho · ‎03-24-2020

@Koffi When a NiFi node attempts to connect to an existing NiFi cluster, there are three files that are checked to make sure they match exactly between the connecting node and the existing copies in the cluster. Those files are: 1. flow.xml.gz 2. users.xml (will only exist if NiFi is secured over https) 3. authorizations.xml (not to be confused with the NiFi authorizers.xml file. Will only exist if NiFi is secured over https) The output in the nifi-app.log of the node should explain exactly what the mismatch was the first time it tried to connect to the cluster. Hope this helps, Matt

MattWho · ‎03-16-2020

@Gubbi Depending on which processor is being used to create your FlowFile from you source linux directory, you will likely have an "absolute.path" FlowFile attribute created on the FlowFile. absolute.path = /users/abc/20200312/gtry/ You can pass that FlowFile to an UpdateAttribute processor which can use NiFi Expression Language (EL) to extract the date from that absolute path in to a new FlowFile attribute Add new property (property name becomes new FlowFile attribute): Property: Value: pathDate ${absolute.path:getDelimitedField('4','/')} The resulting FlowFile will have a new attribute: pathDate = 20200312 Now you can use that FlowFile attribute later when writing to your target directory in S3. I assume you would use the putS3Object processor for this? If so, you can configure the "Object Key" property with the following: /Users/datastore/${pathDate}/${filename} NiFi EL will replace ${pathDate} with "20200312" and ${filename} will be replaced with "gyyy.csv". Hope this helps you, Matt

anil35759 · ‎03-11-2020

Thank you it helped and worked.

justjoe · ‎03-09-2020

Solved it. I had to switch "Strict Host Key Checking" to false. Works just fine. Thanks for the help

vineet_harkut · ‎03-08-2020

Thanks @MattWho

MattWho · ‎03-05-2020

@sfishman In the "Replacement value" configuration property just hold the <shift> key and hit enter instead of using "\n". Here is example of what you would see: This will create the line return you are looking for. Hope this helps, Matt

MattWho · ‎03-05-2020

@varun_rathinam Observations from your configuration: 1. You are using "Defragment" merge strategy which tells me that somewhere upstream in your dataflow you are splitting some FlowFile in to fragments and then you are using this processor to merge those fragments back in to the original FlowFile. Correct? When using Defragment you can not use multiple MergeContent processors in series as i mentioned earlier because the defragment strategy is expecting to find all fragments from the fragment count before merging them. 2. When using the defragment strategy it is the fragment.count attribute on the FlowFiles that dictates when the bin should be merged and not the min number of entries. 3. Each FlowFile that has a unique value in the fragment.identifier will be allocated to a different bin. Setting the number of bins to "1" will never work no matter which merge strategy you choose to use. When the MergeContent processor executes it first checks to see if a free bin is available (if not it merges oldest bin or routes oldest bins FlowFiles to failure in case of Defragment to free up a bin), then it looks at the current FlowFiles in the inbound connection at that exact moment in time and starts allocating them to existing bins or new bins. So at a minimum you should always have at least "2" bins. The default is "5" bins. Having multiple bins does not mean that all those available bins will be used. 4. I see you changed Maximum Number of Entries from default 1000 to 100000. Is this because you know each of the FlowFiles you split will produce up to 100,000 FlowFiles? As i mentioned the ALL FlowFiles allocated to bins have their attributes held in heap memory. Adding to that... If you have multiple bins being filled because you have unique fragment.identifiers being defragmented, you could have even more than 100,000 FlowFiles worth of attributes in heap memory. So your NiFi JVM heap memory being set at only 2GB may lead you to hitting Out Of Memory (OOM) conditions with such a dataflow design. Also want to add that where ever you are doing the original splitting of your FlowFile in your dataflow will also have an impact on heap memory because the FlowFile Attributes for every FlowFile being produced during the split process is held in heap memory until every new split FlowFile being produced is committed to a downstream connection. NiFi connections between processors have swapping enabled by default to help reduce heap usage when queues get large, but same does not apply within the internals of a processors execution. As i mentioned before, the MergeContent does not load FlowFile content in heap memory, so the size of your FlowFiles does not impact heap here. So you really want to step back and look at your use case again and ask yourself: "do I really need to split my source FlowFile and merge it back in to the original FlowFile to satisfy my use case?" NiFi has numerous record based processors for working with records avoiding the need to split them in many use cases. Hope this helps, Matt

Online	Offline
Last Visited	‎04-06-2026 05:18 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎04-06-2026 05:18 AM
Posts	3,467
Kudos received	1637

Cloudera Community

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: FetchSMB not fetching all files

Re: Nifi: How to revoke the import and export Temp...

Re: NiFi 1.10.0 Repository and Management Pack

Re: NiFi - List SFTP / HDFS Processors - State

Re: Data Provenance not showing on imported templa...

Re: Can't connect to the nifi UI

Re: Move entire folder based on date to S3

Re: How to map/select controller service in proces...

Re: Nifi - SFTP and Input Port

Re: Is it possible to create NiFi workflows using ...

Re: How to replace Windows linefeed with Linux lin...

Re: NiFi mergeontent max file size handle