About SAMSAL

scoutjohn · ‎11-21-2023

@SAMSAL thank you, that solved it

simonsig · ‎11-20-2023

Thank you @SAMSAL superstar!

Sipping1n0s · ‎11-16-2023

Thank you, I am thankful for all of your tips and hints. I'm going to accept this as a solution. I'll create a new ones, as needed. Again, thank you.

RabidRacoon · ‎11-13-2023

We did this at our end and ended up re-cycling the provenance repository much faster than usual. The huge amount of data that an output of a tailfile generates can fill up both your content and provenance repositories.

DianaTorres · ‎11-13-2023

@CE Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @SAMSAL has requested? Thanks.

SAMSAL · ‎11-11-2023

There is no magic solution for those scenarios and no one solution fits all out of Nifi that I can think of. You have to understand the nature of the input before you start consuming it and you have to provide the solution catered to this input. Sometimes if you are lucky you can combine multiple scenarios into one flow but that still depends on the complexity of the input. Even thought in your first scenario the second option I proposed seem to be simple enough and it did the job, for your second example its more complex and I dont think the out of the box GrokReader will be able to handle such complexity, therefore the first option of using the ExtractText Processor will work better because you can customize your regex as needed. For example, based on the text you provided: JohnCena32 Male New York USA813668 I can use the following regex: [A-Z][a-z]+[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ In the ExtractText processor I will define a dynamic property for each attribute (city, age, firstname...etc.) and surround the segment of the pattern that corresponds to the value with a parenthesis to extract as matching group. For Example: Age: [A-Z][a-z]+[A-Z][a-z]+(\d+)\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ FirstName: ([A-Z][a-z]+)[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ Gender: [A-Z][a-z]+[A-Z][a-z]+\d+\s((?:Male|Female|M|F))\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s[A-Za-z]+\d+ Country: [A-Z][a-z]+[A-Z][a-z]+\d+\s(?:Male|Female|M|F)\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)?\s([A-Za-z]+)\d+ And so on... This should give you the attribute you need. Then you can use the AttributeToJson processor to get the json output and finally if you want to convert the data to the proper type you can either user JoltTransformation or QueryRecord with cast as shown above. One final note: If you know how to use some external libraries in python for example or groovy or any of the supported code script in the ExecuteScript processor then you can use that to write your custom code to create the required fllowfile\attributes that will help you downstream to generate the final output. If that helps please accept solution. Thanks

PradNiFi1236 · ‎11-09-2023

@SAMSAL , thanks for the quick response. As always you do.

SAMSAL · ‎11-02-2023

To do the split , you need to transform the json input into a format that allows you to do so, for that you can use JoltTransformationJson with the following Jolt Specification: [ { "operation": "modify-overwrite-beta", "spec": { "ids": "=split('[,]',@(1,id))" } }, { "operation": "shift", "spec": { "ids": { "*": { "@": "[&1].id", "@(2,Qid)": "[&1].Qid" } } } } ] The spec above will generate the following output: [ { "id": "652fbf430f1f3f30a3111f11", "Qid": 123 }, { "id": "652fbf430f1f3f30a3333f11", "Qid": 123 } ] Then you can use SplitJson processor where the JsonPath Expression is set to $ To get the id attribute you can use EvaluateJsonPath as follows: The flowfile_id is dynamic property where the value is the json path to the id of the json input. Make sure to set the Destination property to "flowfile-attribute". If that helps please accept solution. Thanks

PradNiFi1236 · ‎10-31-2023

Thanks A lot as Always @SAMSAL for answering all my Jolt Doubts and sharing the knowledge with detailed explanation. I've huge jolt this is one part of it, whenever i'm posting for only one specific issue always. Yeah i know this is hardest part to understand italian chars, the input is english and we are converting to italian whatever i've posted above is second jolt which i'm using. i would have given less input instead of all line items. sorry for that one doubt it is why modify-overwrite-beta logic is not working if i place this above Shift,

RangaReddy · ‎10-24-2023

Hi @SAMSAL I think you want to run the spark application using Standalone mode. Please follow the following steps: 1. Install the Apache Spark 2. Start the Standalone master and workers. By default master will start with port 7777. Try to access and Standalone UI and see all workers are running expected. 3. Once it is running as expected then submit spark application by specifying standalone master host with 7777

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: Jolt to transform muliple values in an array

Re: Help to match and remove value from array with...

Re: how to split and batch insert

Re: How to save NIFI Error log to database table ...

Re: Jolt specification working for Json input but ...

Re: How can I convert a fixed width file into Json...

Re: Unable to fetch attribute value based conditio...

Re: I have to split a json into two using nifi pro...

Re: removing repeated attributes after overwritebe...

Re: spark continously running with exit code 1