About mburgess

vishhs · ‎07-27-2018

Thanks a lot @Matt Burgess for the details on the current limitations and the Jira. I would look at the solution provided by you on the other thread. Appreciate your help on the same. Regards, Vish

vishhs · ‎07-27-2018

sorry for my ignorance. I could resolve this issue by removing one additional space in " list" while wring the flowfile content generated by groovy script. Now I am able to fetch the data from invoke http by passing flowfile content from executescript. Thanks for the support. Regards, Vish

mburgess · ‎07-30-2018

ValidateRecord is more about validating the individual records than it is about validating the entire flow file. If some records are valid and some are invalid, each type will be routed to the corresponding relationship. However, for invalid records, we can't use the same record writer as valid records, or else we know it will fail (because we know they're invalid), so there is a second RecordWriter for invalid records (you might use this to try to record the field names or something, but by the time that ValidateRecord knows the individual record is invalid, it doesn't know that it came in as Avro (for example), nor does it know that you might want it to go out as Avro. That's the flexibility and power of the Record Reader/Writer paradigm, but in this case the tradeoff is that you can't currently treat the entire flow file as valid or invalid. It may make sense to have a "Invalid Record Strategy" property, to choose between "Individual Records" using the RecordWriters (the current behavior), or "Original FlowFile" which would ignore the RecordWriters and instead transfer the entire incoming flow file as-is to the 'invalid' relationship. Please feel free to file an improvement Jira for this capability.

mburgess · ‎07-23-2018

You can use a JOIN clause in the select statement, but it will only work for a single RDBMS. You may find you can join two tables from two databases/schemas in the same RDBMS (if that system lets you), but you can't currently join two tables from totally separate database systems. You could investigate Presto, it allows for joining of tables across multiple systems, and you could have a single connection to it from NiFi in ExecuteSQL. That way it will look like a single RDBMS to NiFi, but Presto can be configured to do the cross-DB join.

mermillod · ‎07-15-2018

If you use large attributes, you will have serious issue with the "snapshot" file in the flow content repository. I've just killed my PROD this way last week : the snapshot was too big too fit in memory at startup : my data was lost.

loveseabetter · ‎05-29-2019

How to split complexed json arrays into individual json objects with SplitJson processor in NIFI? I don't know how to configure the relationship original, split, failure. Json arrays is below { "scrollId1": "xyz", "data": [ { "id": "app-server-dev-glacier", "uuid": "a0733c21-6044-11e9-9129-9b2681a9a063", "name": "app-server-dev-glacier", "type": "archiveStorage", "provider": "aws", "region": "ap-southeast-1", "account": "164110977718" }, { "id": "abc.company.archive.mboi", "uuid": "95100b11-6044-11e9-977a-f5446bd21d81", "name": "abc.company.archive.mboi", "type": "archiveStorage", "provider": "aws", "region": "us-east-1", "account": "852631421774" } ] } I need to split it into { "id": "app-server-dev-glacier", "uuid": "a0733c21-6044-11e9-9129-9b2681a9a063", "name": "app-server-dev-glacier", "type": "archiveStorage", "provider": "aws", "region": "ap-southeast-1", "account": "164110977718" }, { "id": "abc.company.archive.mboi", "uuid": "95100b11-6044-11e9-977a-f5446bd21d81", "name": "abc.company.archive.mboi", "type": "archiveStorage", "provider": "aws", "region": "us-east-1", "account": "852631421774" } Next, I need to insert another field "time" in front of "id", the first attribute of individual object. I used SplitJson processor, and JSON Path Expression is $.data.id.*, but the relationship reports error. Don't know how to config relationship branches, original, split and failure. Any one have any advice? @Shu

MattWho · ‎07-09-2018

@Derek Calderon - Short answer is no. The ExecuteSQL processor is written to write the output to the FlowFile's content. - There is an alternative solution. You have some processor currently feeding FlowFiles to your ExecuteSQL processor via a connection. My suggestion would be to feed that same connection to two different paths. The first connection feeds to a "MergeContent" processor via a funnel and the second feeds to your "ExecuteSQL" processor. The ExecuteSQL processor performs the query and retrieves the data you are looking for writing it to the content of the FlowFile. You then use a processor like "ExtractText" to extract that FlowFIles new content to FlowFile Attributes. Finally you use a processor like "ModifyBytes" to remove all content of this FlowFile. Finally you feed this processor to the same funnel as the other path. The MergeContent processor could then merge these two flowfiles using the "Correlation Attribute Name" property (assuming "filename" is unique, that could be used), min/max entries set to 2, and "Attribute Strategy" set to "Keep All Unique Attributes". The result should be what you are looking for. - Flow would look something like following: Having multiple identical connections does not trigger NiFi to write the 200 mb of content twice to the the content repository. a new FlowFile is created but it points to the sam content claim. New content is only generated when the executeSQL is run against one of the FlowFiles. So this flow does not produce any additional write load on the content repo other then when the executeSQL writes its output which i am assuming is relatively small? - Thank you, Matt

mburgess · ‎07-03-2018

Jeez I would hope not, I'm not aware of any platform differences for Jayway (the underlying library used to do JSONPath stuff in NiFi)

mburgess · ‎06-29-2018

As of NiFi 1.5.0 (via NIFI-4522), you can issue a SQL query in PutSQL while still retaining the incoming flow file contents. For your case, you could send the CSV file to PutSQL and execute a "CREATE TABLE IF NOT EXISTS" statement, which will create the table the first time but allow the CSV to proceed to the "real" destination processor, likely PutDatabaseRecord.

MattWho · ‎06-13-2018

https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html

Online	Offline
Last Visited	‎11-07-2024 11:28 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎11-07-2024 11:28 PM
Posts	892
Kudos received	643

Cloudera Community

Re: Nifi Building error when creating a brand new ...

Re: Tuning PutHive3Streaming NiFi processor

Re: NiFi ExecuteScript - Able to add attributes to...

Re: NiFi - JOLT assign value to attribute from Jso...

Re: NiFi - ExecuteScript for getting max value of ...

Re: handling multiple values for a same parameter ...

Re: Nifi JoltJsonTransform spec help needed

Re: How to validating a JSON file with JSON Schema...

Re: Could NIfi - executeSQL process run sql to jo...

Re: Nifi attribute containing large text value

Re: Split JSON flow file into JSON objects

Re: Is there any way to route the result of Execut...

Re: How to get multiple values at once from JSON f...

Re: Need a processor to test if a table sql exists...

Re: GETSFTP with NiFi cluster