About MattWho

MattWho · ‎07-18-2016

@gkeys What are the permissions on both the file(s) you are trying to pickup with the GetFile processor and the permissions on the directory the file(s) live in? -rwxrwxrwx 1 nifi dataflow 24B Jul 18 18:20 testfile and drwxr-xr-- 3 root dataflow 102B Jul 18 18:20 testdata With the above example permission, I reproduce exactly what you are seeing. If "Keep Source File" is set to true, NiFi creates a new flowfile with the content of the file. If "Keep Source File" is set to false, NiFi GetFile yields because it does not have the necessary permissions to delete the file from the directory. This is because the write bit is required on the source directory for the user who is trying to delete the file(s). In my example nifi is running as user nifi, so he can read the files in the root owned testdata directory because the directory group ownership is dataflow just like my nifi user and the dir has r-x permissions. fi i change that dir permissions to rwx then my nifi user will also be able to delete the testfile. Thanks, Matt

MattWho · ‎07-18-2016

You could also modify the local /etc/hosts file on your ec2 instances so that the hostname "ip-10-40-197.ec2.internal" resolves to the proper external IP addresses for those zk nodes if they have them.

MattWho · ‎07-18-2016

NiFi secure cluster and Site-To-Site authentication is not handled by kerberos. NiFi kerberos authentication is only supported for user authentication. Secure NiFi Site-To-Site communications are still handled using TLS mutual authentication. The error you are seeing is because that TLS mutual auth is failing. The URL you are providing the Remote Process Group (RPG) is using the IP of the target NCM. The NCM is providing its public key to your nodes for autentication and that certificate does not contain the IP as its DN or as a Subject Alternative Name (SAN). So the source NiFi is saying the that the provided certificate shoudl contain 10.110.20.213 but instead it is providing something else. If you do a verbose listing on your keystore on the NCM you will see the contents of the key. Look for CN=<some value> (This value is typically the hostname/FQDN.) Use that value in the URL you are providing your RPG. Make sure your source NiFi (In your case every Node in your NiFi cluster) can resolve that hostname to its proper IP. The other option is to get a new certificate that has the IP added to it as a SAN. Thanks, Matt

MattWho · ‎07-13-2016

i recommend setting up a NiFi cluster that will spread the load across multiple resources. This removes that single point of failure caused by only having one ec2 instance running a NiFI. Now whether a single ec2 instance with NiFi can run your dataflows really depends a lot on your data and what your specific dataflows looks like.for example are you doing a lot of CPU or memory intensive processing in your NiFi dataflows? A good approach is having NiFi sitting on edge systems feeding a central NiFi processing cluster.

MattWho · ‎07-08-2016

@mliem NiFi components (Processors, RPGs, input/output ports, etc...) are designed to run asynchronous. There is no mechanism built in to NiFi for triggering one processor to run as a result of another processor completing its job. That being said, everything you can do via the UI can be done as well through calls directly to the NiFi API. You may consider playing around with the capability using the invokeHTTP processor to make calls to the NiFi API to start and stop specific processor at specific points in your dataflow. Once a processor is started it will run retrieving a thread from the controller to do so. Stopping That processor will not kill that thread, the processor will simply not be scheduled to run again and will be in a state of "stopping" during that time frame.. You can not start a processor that is still "stopping". So you want to be careful where you invoke your start and stop actions. (For example, following your "matched" criteria you start the mergeContent and after the mergeContent you invoke the stop of the mergeContent.) For speed and efficiency's sake, I would look for ways to keep your flow asychronous in design. If you do choose to go this route, I would also build some monitoring into your flow using the monitorActivity processor. This processor can be used to monitor that data continues to flow based upon some configured threshold. If that threshold is exceeded it generates a FlowFile that can be routed to a putEmail processor (as and example) to alert someone that the dataflow is down. This is a safety net so to speak in the event one of your api calls fails for some reason (Network hicup for example). Thanks, Matt

MattWho · ‎07-07-2016

It may be helpful to understand your dataflow better if you can paste a screenshot of the second dataflow you want to alter.

MattWho · ‎07-01-2016

NiFi 1.0 is deep in to development right now. Expect to see it up for vote in August. NiFi 1.0 has considerable re-work done across the board. (New UI, No more NCM for clustering, etc...) Very exciting stuff.

MattWho · ‎06-30-2016

@Alexander Aolaritei NiFi can produce a lot of provenance data. The solution you are looking for will be coming in Apache NiFi 1.0 in the form of a NiFi reporting Task. This "SiteToSiteProvenanceReportingTask" will use the NiFi Site-to-Site (S2S) protocol to send provenance events to another NiFi instance in configurable batches. Of course that target NIfI instance could be yourself; however, that would just produce even more provenance events locally as you handle those messages. So It may be wise to standup another NiFi instance just for Provenance event handling. Upon receiving those provenance events via a S2S input port, you can use standard NiFi processors to split/merge them, route them, and store them in your desired end point (Whether that is local file(s), external DB, etc...). I am not a developer so cannot help with the custom solution you are working on, but just want to share what is coming as another viable solution to your needs. Thanks, Matt

MattWho · ‎06-28-2016

@AnjiReddy Anumolu Let me start off by making sure I fully understand the dataflow you have created to better answer your question. You have added a getFile processor to your flow which will pickup file(s) from a local file system directory and then sends them via the success relationship to a logAttribute processor. What did you do with the logAttributes's success relationship? If it is auto-terminated, you are essentially telling NiFi you are done with the files following a successful logging of the file(s) FlowFile attributes/metadata. If the success relationship has not been defined the processor will remain invalid and cannot be run. In this case the file(s) picked up by the getFile processor will remain queued on the connection between the getFile processor and the logAttribute processor. In either case, when NiFi ingests file(s) they are placed in the NiFi content repository. The location of the content repository is defined/configured in the nifi.properties file. The default places them in a directory created within the default NiFi installation directory: nifi.content.repository.directory.default=./content_repository NiFi stores file(s) in what are known as claims to make most efficient use of the system's hard disks. A claim can contain 1 to many files. The default claim configuration is also defined/configured in the nifi.properties file. The default configuration is as follows: nifi.content.claim.max.appendable.size=10 MB nifi.content.claim.max.flow.files=100 For files smaller then 10 MB they may be stored with other files with up to 100 total files in a single claim. If a file is larger then 10 MB it will end up in a claim of one. At the same time files are written to a claim, FlowFile attributes/metadata is written about the ingested files in the flowfile repository. The location of the flowfile repository is also defined/configured in the nifi.properties file: nifi.flowfile.repository.directory=./flowfile_repository These FlowFile attributes/metadata will contain information such as filename, filesize, location of claim in content repository, claim offset, etc... The claim offset is the starting byte location of a particular file's content within a claim. The fileSize defines the number of bytes from that offset that makes up the compete data. The nifi-app.log contains fairly robust logging by default (configured in logback.xml file). When NiFi ingest files, NiFi will log that and that log line will contain information about the claim (location and offset). When NiFi auto-terminates FlowFiles they are removed from the content repository. Depending on the content repository archive setup, the file(s) may be archived for a period of time. In the case of archived file(s), it can be replayed using the provenance NiFi UI. Thanks, Matt

MattWho · ‎06-23-2016

Was your VM restarted or the NiFi restarted since HDP was installed?

Online	Offline
Last Visited	‎10-23-2025 07:03 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎10-23-2025 07:03 AM
Posts	3,387
Kudos received	1613

Cloudera Community

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Nifi Registry and LDAP

Re: Why does NiFi GetFile work with KeepSourceFile...

Re: Connect a Nifi service which lives on an EC2 i...

Re: The Remote Process Group has warning in an ant...

Re: Productionizing Apache Nifi

Re: Merge Content processors with dependency from ...

Re: Merge Content processors with dependency from ...

Re: How to extract NiFi provenance?

Re: How to extract NiFi provenance?

Re: How to Know in which directory data stores aft...

Re: Nifi UI not launching