About MattWho

MattWho · ‎01-09-2017

@Rohit Ravishankar If you found this information helpful, please accept the answer.

MattWho · ‎01-09-2017

@Rohit Ravishankar Let assume you want to merge the text based Content of both your ABC.txt and CTRL_ABC.txt files into one single NiFi FlowFile. The resulting content of that new merge FlowFile should consist of the content of CTRL_ABC.txt before ABC.txt. A simple combination of ListFile (need 2 of these) --> UpdateAtttribute (need 2 of these) --> FetchFile --> MergeContent will do the trick. The flow would look something like this: A template is below that can be uploaded to your NiFi which will show how to configure each of these processors so that they are merged in the proper order and merged with the appropriate CTRL files. merge-two-files-using-defragmentation.xml Hope this helps, Matt

MattWho · ‎01-09-2017

@Rohit Ravishankar How do you plan to combine the "content" of these two files together? This is the first question that needs to be addressed. You can use the mergeContent processor to merge the "content" of multiple FlowFiles using binary concatenation. If this is acceptable, then all you need to do is consume both your datafile and control file, use updateAttribute to extract the a common name from both filenames into new attribute, and finally use that new attribute as the "correlation Attribute Name" in the MergeContent processor. You would also want to set the min number of entries to 2 in the mergeContent processor. Matt

MattWho · ‎01-09-2017

@Joshua Adeleke Also check your NiFi app log for any Out Of Memory Errors (OOME). The SplitText processor may be having memory issues trying to split over 40k records. You could try using two splitText processors in series with the first splitting on a 10,000 "Line Split Count" and the second then splitting those 10,000 line FlowFiles with a 1 "Line Split Count". This will greatly reduce the heap memory footprint. In addition, if you do a listing on the queue feeding your putSQL processor, do you see any listed FlowFiles with an unexpected size? Matt

MattWho · ‎01-09-2017

@Narasimma varman Is your NiFi a single instance of NiFi or a NiFi cluster? If it is a cluster, keep in my mind by default the GetFile processor will be running on every node in that cluster. The validate will also run on every node as well, so make sure the directory exists on all nodes. Also make sure you have only specified a d directory path in the "Input Directory" property in GetFile. In you case, you should have only "/root/example" for that property. The filename you wish to pickup should be specified in the "File Filter" property. Matt

MattWho · ‎01-06-2017

@mel mendoza By default NiFi logs processor level events to the nifi-app.log. The default overall nifi-app.log log level in the latest releases is set to WARN. This means that only WARN and ERROR log level events are written to the logs. The logs that report successful data delivery would be INFO level events, so you would need to adjust the NiFi logging to get the output you are looking for. Just setting the default logging level to INFO for the nifi-app.log may make things way to noisy in teh log. NiFi's logging is configured in the logback.xml file. You will see in the logback.xml that NiFi has three default appenders for nifi-app.log, nifi-bootstrap.log and the nifi-user.log. While you cannot configure logging down to a specific processor, you can configure logging against a specific processor class. So it is possible to create a new appender (Would create a new log if desired) and then create additional loggers for the specific processor classes you want INFO level enabled for. If you feel i have addressed your original question, please accept my answer. Matt

MattWho · ‎01-05-2017

@mel mendoza NiFi supports many different protocols that can be used for data ingestion. Many of those are fault tolerant but some like UDP are not. For the fault tolerant protocols, NiFi is built in such a way to ensure ingestion of the data at least once. NiFi does this in three phases: 1. Receives data over fault tolerant protocol 2. Commits Session to NiFi. 3. Depending on processor either deletes/acknowledges success to the data source or saves state about the data that was ingested. With this model comes the small possibility that some NiFi fault or server failure between phase 2 and 3 could result in phase 3 not happening. In that case, upon recovery, that particular data may be ingested a second time. Thanks, Matt

MattWho · ‎01-04-2017

@Timothy Spann Once you run out of heap space, NiFi is going to start throwing all kinds of WARN and ERROR log entries. Looks like you need to increase your configured heap memory settings in the bootstrap.conf file.

MattWho · ‎01-04-2017

@Aman Jain The ListSFTP processor has a "File Filter" property that allow you to use a java regular expression to specify the filename pattern you want the processor to look for on the target SFTP server. It does not give you the capability to pull some value from MySQL to use here which is what it sounds like you want to do. That being said.... keep in mind that the ListSFTP processor does actually fetch and data, it only produces a zero byte NiFi FlowFile for each File it lists. It is the responsibility of the FetchSFTP processor to actually retrieve the data content and add it to the NiFi FlowFile. Perhaps you can have NiFi always list all the files for the target SFTP server and filter out hose 0 byte FlowFiles you do not want before doing the FetchSFTP on each run? Having one flow that retrieves your load date from mysql and writes it to a distributed Cache Service in NiFi. then use Then have another flow to list files and filter them based on the current value loaded in the Cache service. The filtered FlowFiles could then be sent to FetchSFTP processor while other are just dropped/auto-terminated. Matt

MattWho · ‎01-04-2017

@bala krishnan 1. "Concurrent tasks" is nothing new to NiFi. There currently is no capability to set concurrency at the process group level and I am not sure that would be a good idea. I would assume you are looking for a way to set a number of "concurrent tasks" that would then get applied to every processor within a process group? Some processors involve tasks that are more cpu intensive then others. For example: CompressContent processor is cpu intensive. For every concurrent task it i assigned, 100% of cpu core is consumed for each file it compresses/decompresses. adding to many "concurrent tasks" here could have serious impact on the system hosting NiFi. UpdateAttribute processor on the other hand typically has very little CPU impact. One concurrent task here can process batches of FlowFiles very rapidly, so multiple concurrent tasks is usually unnecessary and a waste of server resources. 2. There is no defined algorithm for how many concurrent tasks a processor should receive out of the gate. Concurrent Tasks assignment is done through testing and fine tuning a dataflow using production data samples at production volumes. Evaluating your dataflow for bottlenecks in combination with tracking systems resource loads (CPU, Memory, network and disk I/O) can help tune concurrent task settings appropriately . Its is two often the case where users start off with assigning a high number of concurrent task rather then starting at the bottom. You have to remember that your system has only so much CPU to share. Assigning to many concurrent tasks to a single processor will hinder other processors who are looking for cpu time. Along with setting "concurrent tasks" on individual processors, there are global maximum timer and event driven thread settings in NiFi (Defaults are 10 and 5 respectively). These control the maximum number of threads NiFi will request from the server that will be used to fulfill the concurrent task request from the NiFi processor components. These global values can be adjusted in "controller settings" (Located via the hamburger menu in the upper right corner of the NiFi UI.) Typical setting here are double to quadruple the number of CPU cores you have on your server. Giving excessive values here doe snot improve performance as those threads just spend more time in CPU wait. Thanks, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,135
Kudos received	1561

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: Combining Attributes in NiFi

Re: Combining Attributes in NiFi

Re: Combining Attributes in NiFi

Re: Loading csv files into Oracle DB with NiFi

Re: How to load data from local system file to HDF...

Re: NiFi : How can we make sure 100% data collecti...

Re: NiFi : How can we make sure 100% data collecti...

Re: NIFI Warnings and Errors

Re: I want to fetch files using liststfp based on ...

Re: How to improve nifi concurrency