About Onkar_Gagre

Onkar_Gagre · ‎12-11-2022

Thanks Matt for your inputs. Regarding the consumer group mentioned above, I understand there will be 32 to concurrent task which will be fetching the data from kafa however the consumer group ID will remain same and as long as consumer group ID remains same among the nodes the the ingestion load will be distributed equally to each node. Regarding the max time driven threads configured, this is set to 128 which is exactly 4 times of all cores across nodes(8*4=32). Regarding CPU wait time. Can you please elaborate as to how we can efficiently reduce wait time to improve the CPU usage or increasing number of cores is the only viable option here. Thanks in advance, Onkar

Onkar_Gagre · ‎12-07-2022

1. What is the CPU and Memory usage of your NiFi instances when the QueryRecord processor is stopped? This is almost drops down to 20% straight from 90%. Even buffered RAM also somewhat released 2. How is your QueryRecord processor configured to include scheduling and concurrent task configurations? We have consumerKafkaRecord processor feeding data into QueryRecord. the data is huge with smaller chunk of messages hence using demarcation I am batching multiple records into single flowfile and forwarding it to QueryRecord then based on select statement these likely grouped record routed further to Kafkaproducer processor. ConsumeKafka running with total 32 (8 thread per node) threads whereas QueryRecord running with 160(40 thread per node) threads to keep up with the incoming data. both having run duration set to 0(always running) and run schedule is also set to 0. What other processors were introduced as part of this new dataflow? consumerKafkaRecord,PublishkafkaRecord 3. What does disk I/O look like while this processor is running? Disk I/O given below for each individual node. Node 1 Linux 3.10.0-1160.6.1.el7.x86_64 () 07/12/22 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 27.61 0.00 4.18 0.51 0.00 67.69 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.00 20.11 45.92 86.58 1942.35 27986.13 451.74 0.08 0.62 1.92 1.00 0.2012 2.67 sda 0.00 0.31 9.64 11.27 430.75 136.20 54.22 0.01 0.26 0.90 7.92 0.2176 0.46 sdb 0.00 0.00 0.03 0.00 0.66 0.00 51.43 0.00 0.40 0.40 0.00 0.2447 0.00 dm-0 0.00 0.00 4.55 0.14 189.37 1.18 81.32 0.00 1.01 0.86 5.78 0.3576 0.17 dm-1 0.00 0.00 0.02 0.00 0.58 0.00 51.80 0.00 0.41 0.41 0.00 0.2282 0.00 dm-2 0.00 0.00 45.92 106.70 1942.27 27986.13 392.19 0.02 0.14 0.02 0.19 0.1751 2.67 dm-3 0.00 0.00 0.01 0.00 0.19 0.00 42.84 0.00 0.58 0.49 1.75 0.3694 0.00 dm-4 0.00 0.00 0.04 0.01 6.58 0.69 257.40 0.00 7.83 1.71 29.56 0.5251 0.00 dm-5 0.00 0.00 4.12 10.31 200.81 126.45 45.37 0.09 6.34 0.95 8.49 0.1599 0.23 dm-6 0.00 0.00 0.88 0.66 31.04 4.32 45.87 0.00 1.18 0.81 1.67 0.3023 0.05 dm-7 0.00 0.00 0.01 0.00 0.15 0.00 54.79 0.00 0.45 0.44 1.28 0.2212 0.00 dm-8 0.00 0.00 0.03 0.46 1.00 3.56 18.70 0.00 1.51 0.79 1.56 0.3988 0.02 sdd 0.00 0.00 1.80 0.04 82.56 0.89 91.01 0.00 1.03 1.02 1.67 0.0830 0.02 dm-9 0.00 0.00 1.79 0.04 82.51 0.89 90.96 0.00 1.03 1.02 1.65 0.0832 0.02 Node 2 Linux 3.10.0-1160.6.1.el7.x86_64 () 07/12/22 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10.02 0.00 2.96 0.45 0.00 86.56 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.29 0.71 7.09 27.72 88.67 29.85 0.06 7.18 0.89 7.81 0.2198 0.17 sdc 0.00 0.82 22.34 28.81 831.68 12659.33 527.55 0.00 0.02 1.21 2.53 0.9838 5.03 sdb 0.03 0.06 0.03 0.03 0.26 0.37 18.50 0.00 1.39 0.40 2.41 1.0156 0.01 dm-0 0.00 0.00 0.39 0.13 16.01 1.10 66.36 0.00 2.00 0.82 5.58 0.4171 0.02 dm-1 0.00 0.00 0.06 0.09 0.24 0.37 8.04 0.00 3.17 0.47 4.92 0.4512 0.01 dm-2 0.00 0.00 22.33 29.62 831.66 12659.33 519.31 0.03 0.58 1.21 0.10 0.9686 5.03 dm-3 0.00 0.00 0.00 0.00 0.01 0.00 4.10 0.00 0.39 0.24 1.46 0.3310 0.00 dm-4 0.00 0.00 0.01 0.01 2.01 0.73 245.81 0.00 15.63 1.69 31.50 0.5168 0.00 dm-5 0.00 0.00 0.20 6.17 6.61 79.26 26.97 0.05 8.47 1.13 8.71 0.1882 0.12 dm-6 0.00 0.00 0.09 0.63 2.49 4.07 18.37 0.00 1.52 0.66 1.64 0.2323 0.02 dm-7 0.00 0.00 0.00 0.00 0.01 0.00 3.40 0.00 0.24 0.23 1.25 0.2337 0.00 dm-8 0.00 0.00 0.01 0.45 0.27 3.50 16.53 0.00 1.24 0.62 1.24 0.3516 0.02 sdd 0.00 0.00 0.00 0.03 0.21 0.70 47.43 0.00 1.39 0.91 1.46 0.5485 0.00 dm-9 0.00 0.00 0.00 0.04 0.19 0.70 45.75 0.00 1.44 1.38 1.44 0.5514 0.00 Node 3 Linux 3.10.0-1160.6.1.el7.x86_64 () 07/12/22 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 73.23 0.00 6.84 0.13 0.00 19.81 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.00 81.50 39.85 312.01 3142.78 100006.63 586.31 0.13 0.36 2.56 0.08 0.2139 7.52 sda 0.01 0.37 25.76 7.96 1161.80 106.70 75.25 0.04 1.12 1.08 1.27 0.3697 1.25 sdd 0.00 0.02 2.67 0.29 104.08 6.11 74.36 0.00 1.12 1.14 0.95 0.3594 0.11 sdb 2.61 2.90 1.76 1.12 17.94 16.09 23.66 0.01 2.29 1.54 3.46 1.1372 0.33 dm-0 0.00 0.00 13.96 0.21 609.03 1.80 86.21 0.01 0.99 0.99 1.16 0.4736 0.67 dm-1 0.00 0.00 4.36 4.02 17.88 16.09 8.11 0.03 3.04 1.47 4.74 0.3903 0.33 dm-2 0.00 0.00 2.66 0.31 104.02 6.11 74.22 0.00 1.13 1.15 0.95 0.3598 0.11 dm-3 0.00 0.00 0.03 0.00 0.56 0.00 38.32 0.00 0.80 0.80 0.84 0.7141 0.00 dm-4 0.00 0.00 0.07 0.01 9.15 0.37 229.15 0.00 3.96 4.13 2.73 1.7046 0.01 dm-5 0.00 0.00 9.15 6.92 459.28 96.14 69.14 0.02 1.25 1.19 1.32 0.2486 0.40 dm-6 0.00 0.00 2.45 0.70 77.04 4.76 51.90 0.00 1.04 1.05 1.04 0.4814 0.15 dm-7 0.00 0.00 0.02 0.00 0.54 0.00 43.71 0.00 0.29 0.29 0.74 0.2028 0.00 dm-8 0.00 0.00 0.13 0.48 3.51 3.64 23.29 0.00 0.97 1.21 0.91 0.4330 0.03 dm-9 0.00 0.00 39.84 393.51 3142.72 100006.63 476.06 0.26 0.61 2.58 0.41 0.1758 7.62 Node 4 Linux 3.10.0-1160.6.1.el7.x86_64 () 07/12/22 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 20.01 0.00 3.07 0.28 0.00 76.64 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.00 12.92 48.20 62.41 1739.08 20099.77 394.88 0.07 0.66 1.06 0.35 0.3109 3.44 sdb 0.00 0.00 0.02 0.00 0.67 0.00 54.01 0.00 0.30 0.30 0.00 0.2148 0.00 sda 0.00 0.31 5.91 6.95 320.94 87.50 63.51 0.02 1.57 1.88 1.31 0.3440 0.44 dm-0 0.00 0.00 3.80 0.12 199.60 1.08 102.36 0.01 1.64 1.64 1.57 0.5978 0.23 dm-1 0.00 0.00 0.02 0.00 0.59 0.00 51.80 0.00 0.28 0.28 0.00 0.1954 0.00 dm-2 0.00 0.00 48.20 75.33 1739.00 20099.77 353.57 0.05 0.32 1.38 0.63 0.2787 3.44 dm-3 0.00 0.00 0.00 0.00 0.07 0.00 34.24 0.00 0.95 0.95 0.95 0.8181 0.00 dm-4 0.00 0.00 0.03 0.01 4.49 0.60 239.72 0.00 4.34 4.26 4.60 1.1573 0.00 dm-5 0.00 0.00 1.38 6.04 79.01 78.31 42.40 0.01 1.58 2.54 1.36 0.2008 0.15 dm-6 0.00 0.00 0.69 0.66 35.55 4.15 58.90 0.00 1.39 1.80 0.95 0.4093 0.06 dm-7 0.00 0.00 0.00 0.00 0.06 0.00 50.53 0.00 0.36 0.35 1.05 0.2573 0.00 dm-8 0.00 0.00 0.02 0.43 0.49 3.36 17.44 0.00 0.95 2.32 0.90 0.4003 0.02 sdd 0.00 0.00 1.78 0.03 125.98 0.68 140.36 0.00 2.60 2.61 1.99 0.2107 0.04 dm-9 0.00 0.00 1.78 0.03 125.95 0.68 140.00 0.00 2.60 2.61 1.98 0.2099 0.04

Onkar_Gagre · ‎12-06-2022

Hi All, I am running 4 node Nifi cluster with each node having 8 core and 32 GB or RAM. it was running fine until I introduced QueyRecord processor to process huge metric data (14 GB per min). the CPU now consistently remaining above 85% and memory above 90% (buffering too much). Can someone let me know how to diagnose the issue and find the root cause. Thanks in advance

Onkar_Gagre · ‎04-25-2022

Hi, I am using mirrormaker 2 to copy data of one topic from source cluster to destination cluster. data is getting copied but I see 3 hours of lag. and I am not able to find the exact cause of it (whether the delay is at consumer side or producer ) as it is using kafka connect framework. Below are the config details test topic having 2 partition and replication factor as 3 number of messages coming into the source topic is 8K per second.

Onkar_Gagre · ‎03-30-2022

Hi , I have two kafka cluster where I need to copy only the topic configuration from one cluster to other without copying any data. How can we do that. I know using replicator/mirrormaker it can be done but I am not interested in data here.

Onkar_Gagre · ‎03-14-2022

This is now sorted. Batching using record based processor helped us to improve performance in multifold. didn't know that that the excess of flowfile might result in slowness in NIFI.

Onkar_Gagre · ‎03-14-2022

Thank you Araujo. This has helped a lot.

Onkar_Gagre · ‎03-14-2022

Hi, I am facing issue accessing one of the child json attribute while forming SQL in QureyRecord processor. PFB details here is my Avro schema added for JSON reader and writter. { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "labels", "type": { "name": "labels", "type": "record", "fields": [ { "name": "__name__", "type": "string" }, { "name": "cucsMemoryUnitInstanceId", "type": "string" }, { "name": "instance", "type": "string" }, { "name": "job", "type": "string" }, { "name": "monitor", "type": "string" }, { "name": "site_identifier", "type": "string" } ] } }, { "name": "name", "type": "string" }, { "name": "timestamp", "type": "string" }, { "name": "value", "type": "string" }, { "name": "producedAt", "type": "string" } ] } now the SQL quary for first level attribute works fine. like SELECT * from FLOWFILE where name = 'xyz' or SELECT * from FLOWFILE where producedAt = 'dd-mm-yyyy' however If I have to access attribute of nested JSON it wont work e.g SELECT * from FLOWFILE where site_identifier = 'xyz' --> This throws the error.

Onkar_Gagre · ‎03-10-2022

Thanks Matt. The name field is different for each record json object. I will try using partition record with JSONTreeReader and see if I can segregate the flow based on name field and get it processed in batch.

Onkar_Gagre · ‎03-10-2022

Thanks. But I am now stuck on how to fetch cascaded fields in QueryRecord processor. everytime I use any attribute in sql which is cascaded it throws error saying no column found in table. Below is AVRO Schema. I am not able to access "cucsMemoryUnitInstanceId","site_identifier" but I am able to access "producedAt" "name" which are present at first level { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "labels", "type": { "name": "labels", "type": "record", "fields": [ { "name": "__name__", "type": "string" }, { "name": "cucsMemoryUnitInstanceId", "type": "string" }, { "name": "instance", "type": "string" }, { "name": "job", "type": "string" }, { "name": "monitor", "type": "string" }, { "name": "site_identifier", "type": "string" } ] } }, { "name": "name", "type": "string" }, { "name": "timestamp", "type": "string" }, { "name": "value", "type": "string" }, { "name": "producedAt", "type": "string" } ] }

Online	Offline
Last Visited	‎12-21-2022 06:56 AM

Member Since	‎03-09-2022 03:33 AM
Last Visited	‎12-21-2022 06:56 AM
Posts	11
Kudos received	1

Cloudera Community

Re: Delay in Processing huge number of flowfile us...

Re: NIFI cluster with QueryRecord processor using ...

Re: NIFI cluster with QueryRecord processor using ...

NIFI cluster with QueryRecord processor using high...

Mirrormaker 2.0 delay and how to identify bottlene...

How to copy topic configuration (only metadata) fr...

Re: Delay in Processing huge number of flowfile us...

Re: QueryRecord processor issue with nested JSON

QueryRecord processor issue with nested JSON

Re: Delay in Processing huge number of flowfile us...

Re: Delay in Processing huge number of flowfile us...