About Green_

Vasu_ · ‎07-31-2024

@Green_ I had the same issue can you please share the detailed steps of how you resolved it? As I'm new to NiFi.

Green_ · ‎09-21-2023

I am also experiencing this issue when attempting to write data to a redis in cluser mode. Did you find a solution or workaround @sofronic ?

ssco · ‎12-12-2022

Hello Eyal, thanks once again for your interest in helping out. I have not expressed myself correctly. I believe there is no backpressure, as I confirmed visually on the nifi UI. On the upper banner I see there are globally about 5000 flowfiles circulating at a particular time. I see some of the queues have 20, 50, 100, 200 flowfiles waiting at an instant. The queue that has more queued flow files is the kafka producer processor, that has about 2000. I believe these quantities are not sufficient to trigger backpressure-caused adjustment of scheduling of tasks. Apache Benchmark sends concurrent http requests. In this case, i set each of the 5 client instances to send each 1000 requests concurrently. I have thread pool sized to 96 and assigned 16 concurrent tasks to all of the critical path processors. Despite this, I get a low core load average (between 2 and 7), the disk's IOPS are in the dozens per second (the maximum that the disks allow is 3000). I'm all out of ideas at this point. 😄 Thanks for the support!

MattWho · ‎12-09-2022

@F_Amini @Green_ Is absolutely correct here. You should be careful when increasing concurrent tasks as just blindly increasing it everywhere can have the opposite effect on throughput. I recommend stopping setting the concurrent tasks back to 1 or maybe 2 on all the processor where you have adjusted away from the default of 1 concurrent task. Then take a look at the processor further downstream in your dataflow where it has a red input connection but black (no backpressure) outbound connections. This processor s @Green_ mentioned is the processor causing all your upstream backlog. Your'll want to monitor your CPU usage as you make small incremental adjustments to this processors concurrent tasks until you see the upstream backlog start to come down. If while monitoring CPU, you see it spike pretty consistently at 100% usage across all your cores, then your dataflow has pretty much reached the max throughput it can handle for yoru specific dataflow design. At this point you need to look at other options like setting up a NiFi cluster where this work load can be spread across multiple servers or designing your datafow differently with different processors to accomplish same use case that may have a lesser impact on CPU (not always a possibility). Thanks, Matt

Fredi · ‎12-01-2022

Hi, thanks for the details. Unfortunately it is not working. I get an empty array [] as output. I have tried it with extract and split mode. I applied the schema text property as suggested with "NestedKey" and "nestedValue" as name. None gives me an output. Meanwhile I have achieved what I wanted using SplitContent and then again another jolt processor. Of course it would be more elegant if I could make it work with ForkRecord.

Green_ · ‎11-25-2022

Hi @ripo , I've tried it out in my own environment and it does not seem like ports have an 'ENABLED' state. If a port is DISABLED, you can only update it to be STOPPED. from a STOPPED state, you can either switch the port to RUNNING or back to DISABLED.

hegdemahendra · ‎11-22-2022

@Green_ Thank you so much ! It worked as expected.

Green_ · ‎11-19-2022

I am bumping this question in hopes someone might know of a better solution

Green_ · ‎11-18-2022

A couple of reasons: At the base level, operations in the UI all use the REST API - if you use chrome's devtools and go to the networking tab, you can see the exact REST API route used for every click you do on the canvas. If you'd ever want to automate something you know you can manually do in the UI, using this trick you can perfectly replicate it with with RESTful calls. In the same vain, the REST API has far more routes/operations available. The CLI tools seems to have around ~70 operations, whilst the rest api seems to have over 200. The CLI commands are part of the nifi toolkit. Whilst the toolkit does get updated, I believe it does not get the same level of attention as nifi's main codebase and as such it'd be better not to rely on it completely. This is not to say you can't use the CLI tool - rather, just my opinion on the matter and some insight for how my team writes automations on nifi 🙂

Johnny2x · ‎10-15-2022

Hi Green Really appreciate the assistance, if anything its probably me not clearly articulating what I'm trying to achieve, two weeks in with nifi The raw content from the API is received as follows "{ "ret_code" : 0, "ret_msg" : "OK", "ext_code" : "", "ext_info" : "", "result" : [ { "id" : "187715622692", "symbol" : "BTCUSDT", "price" : 19109.5, "qty" : 0.004, "side" : "Buy", "time" : "2022-10-16T01:25:31.000Z", "trade_time_ms" : 1665883531832, "is_block_trade" : false }, { "id" : "187715618142", "symbol" : "BTCUSDT", "price" : 19109.5, "qty" : 0.882, "side" : "Buy", "time" : "2022-10-16T01:25:31.000Z", "trade_time_ms" : 1665883531123, "is_block_trade" : false }, { "id" : "187715614682", "symbol" : "BTCUSDT", "price" : 19109.5, "qty" : 0.001, "side" : "Buy", "time" : "2022-10-16T01:25:30.000Z", "trade_time_ms" : 1665883530414, "is_block_trade" : false" Likely I was trying to over engineer with the jolt, and ended up with { "id" : [ 1.84310970522E11, 1.84310967802E11 ], "symbol" : [ "BTCUSDT", "BTCUSDT" ], "price" : [ 19241.5, 19241.0 ], "qty" : [ 1.896, 0.002 ], "side" : [ "Buy", "Sell" ], "time" : [ "2022-10-11T12:26:21.000Z", "2022-10-11T12:26:21.000Z" ], "trade_time_ms" : [ 1.665491181666E12, 1.665491181604E12 ], "is_block_trade" : [ false, false ] } Where if I just used a json split, which gave the following { "id" : "187715622692", "symbol" : "BTCUSDT", "price" : 19109.5, "qty" : 0.004, "side" : "Buy", "time" : "2022-10-16T01:25:31.000Z", "trade_time_ms" : 1665883531832, "is_block_trade" : false } The format above was accepted by PutDatabaseRecord, where the issue moved to duplicate ID being written, setting ID as primary key seems have stopped the duplicate being written but is probably not the cleanest solution... The other API Im dealing with is where I think the solution you suggested with be the ideal use case... { "lastUpdateId" : 579582125552, "E" : 1665885750096, "T" : 1665885750088, "symbol" : "BTCUSD_PERP", "pair" : "BTCUSD", "bids" : [ [ "19139.5", "8824" ], [ "19139.4", "757" ] "asks" : [ [ "19139.6", "3165" ], [ "19139.7", "812" ] } Where the requirement would be to extract for example "bids" : [ [ "19139.5", "8824" ], [ "19139.4", "757" ] into two seperate record .. maintaining the fields in each

Online	Offline
Last Visited	‎09-28-2023 09:40 AM

Member Since	‎08-01-2021 07:36 AM
Last Visited	‎09-28-2023 09:40 AM
Posts	48
Kudos received	10

Cloudera Community

Re: How to delete a variable from nifi variable re...

Re: PutDatabaseRecord truncates microseconds from ...

Re: nifi-replace file headers using replacetext pr...

Re: Get only element based on a single parameter

Re: Update attributes of json content using rules

Re: PutDatabaseRecord truncates microseconds from ...

Re: Support of redis cluster in nifi - save and fe...

Re: NiFi HTTP listener and processing - improving ...

Re: Nifi queues are full and create backlogs

Re: SplitJson for nested json content

Re: NIFI enabling an output port using the REST AP...

Re: How to delete a variable from nifi variable re...

Re: Validating JSON schema with max length and enu...

Re: Automate the NIFI workflow from CLI

Re: nifi splitjson - JsonTransform