About _mark_

_mark_ · ‎06-03-2021

Hi @MattWho , Thanks for the response. I'll happily create a feature request, though I wasn't sure if I was missing something obvious that would meet my objective/requirements. Thank you for clarifying, I'll go take a look at the links you provided (admittedly I had missed the mailing lists, oops). Kind regards Mark

_mark_ · ‎05-30-2021

I hope a bump to the top of the recent post list isn't breaking any rules, but I'm hoping someone might be able to offer an opinion, so a bump it is!

_mark_ · ‎05-21-2021

There isn't enough information in your question to provide you with an accurate answer. However, note that the FetchFile processor is intended to work with an upstream ListFile processor, which provides you with the opportunity to specify a location and other filtering criteria such as file name patterns or particular sub folders (path pattern) amongst other options. By default, the ListFile processor outputs one empty FlowFile with attributes related to each file it identifies, these are then used as an input for the FetchFile so it knows which files to retrieve. Also, the ListFile is aware of the of its last listing, and will only list files younger than this on it's next execution. Hope that helps. The List & Fetch pattern is very common, however should you have simpler requirements, the GetFile processor may be suitable.

_mark_ · ‎05-21-2021

Hi, Try using the replace text processor in "Literal Replace" mode for the Replacement Strategy setting. It's no obvious, but I have placed a single space in the Search Value setting. Alternatively, you could explore using an Avro schema to achieve this, and this links in nicely with your other question too. For example, given a flow file with the content (I'm using the GenerateFlowFile to create this): Using this flow as a test: I've created content in the GenerateFlowFile of: with an attribute called avro.schema: Note the highlighted "name" attribute which defines the name you need on the output, and the "aliases" array value which is the input name value. For reference, here's the GenerateFlowFile configuration: The ConvertRecord has the following CSVReader configured: And using an otherwise normally configured CSVRecordSetWriter I get the following output: Also note that I have also changed the case using this approach, in fact you can rename the column to whatever you need. I found working with Avro schemas a bit of a steep learning curve, but definitely worth the time investment. I mention this in particular as it compliments your other question about date formats very well. Hope that helps

_mark_ · ‎05-21-2021

Hi, I've just noticed your date strings in the sample aren't consistent which will make things difficult I think. For example in line 1 your OrderDate format is "M/dd/yyyy" (single digit month) and in ship date it seems to be "MM/dd/yyyy" (double digit month). For the CSV to work correctly all of your date fields would need to adhere to the same format I believe. You can do a couple of things to resolve this: Fix it in the source. The best option in my opinion, if possible. Clean the data in NiFi before it arrives at the PutDatabaseRecord processor. Unfortunately the second is a touch beyond my current level of expertise, but if I was you I would explore either ReplaceText processor with the appropriately regex expression, or a an UpdateRecord processor with a SQL like statement to update this field in the flowfile - note though this would would require a separate CSVReader service that reads your data as strings. Good luck.

_mark_ · ‎05-21-2021

Hi, Check your "Date Format" setting in the CSVReader service where you can specify a format string to interpreting text as a date. At a guess either "M/dd/yyyy" or M/dd/yyyy" would work based on the data sample provided. More information on date format strings here: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html As an aside, I don't think the CSVRecordSetWriter settings are much use here as you're terminating all outbound PutDatabaseRecord relationships. Hope that helps.

_mark_ · ‎05-21-2021

I had a similar issue with the PutEmail processor a few months back when I 'imported' my existing flow into a new NiFi installation. Having investigated I recall that there was some issue with the email library in JRE 11 and reverting back to JRE 8 fixed my problem for me. I'm certainly no expert on the matter but worth exploring until someone can be more specific. Good luck.

_mark_ · ‎05-21-2021

Hello, I'd like to implement a lock mechanism in a data flow that prevents a flow file from progressing. Similar in a way to the wait processor, holding a flowfile in a queue while another flowfile with a corresponding signal attribute passes through a 'gate' further down the flow, similar to a notify. In my specific case I load a number of files received together in to target tables. I have a single flow which does the following: PutSql - issues a truncate table command that empties the target staging table. The table name is taken from an attribute. PutDatabaseRecord - Inserts the records into the staging table based on the same attribute. PutSql - executes a stored procedure passing the relevant tables name as parameter from the attribute. This SP merges data from the staging table into a 'main' table. A number of these files, share the same staging and target tables and on occasions I have a race condition where the the staging table is truncated by a second flowfile before the stored procedure has had time to run for the first. There are never more than two flowfiles destined for the same target table currently, so I'm able to mitigate the issue by routing one set to a retry processor and penalising the flowfiles for an arbitrary amount of time to give the first set time to complete their load. This isn't particularly elegant and it doesn't feel like it would scale very well. Also, my situation is likely to change, and I could have more than two common files in the future. To my mind, if I could have a wait processor check a cache when a flowfile passes through it, and if the cache is empty it adds the target table name value to the cache as a signal. After the third step above is complete for this flowfile, it passes through a notify type processor which removes the signal attribute from the cache releasing any flowfiles with a matching attribute from the queue on the earlier wait. I don't think I can get the wait and notify to work in this manner, but I could be wrong? If so, is there another way to achieve this type of functionality> I'm aware of the Processor Group FlowFile Concurrency and Outbound Policy settings, but these would be too restrictive only processing a single flow file at a time. I would like to only process as many flowfiles concurrently as the DB will take, only holdiing the flowfiles which might cause the race condition. I hope that's clear. Thanks in advance.

Online	Offline
Last Visited	‎07-14-2021 10:10 AM

Member Since	‎05-16-2021 02:35 AM
Last Visited	‎07-14-2021 10:10 AM
Posts	8
Kudos received	2

Cloudera Community

Re: Attribute based flow lock - Similar to Wait No...

Re: Attribute based flow lock - Similar to Wait No...

Re: how can I fetch multiples files using only Fe...

Re: NiFI: how can i change space to underscore wit...

Re: NiFI: how can i convert string to date

Re: NiFI: how can i convert string to date

Re: Nifi PutEmail processor throwing failed to pr...

Attribute based flow lock - Similar to Wait Notify...