Member since
11-21-2022
5
Posts
0
Kudos Received
0
Solutions
12-12-2022
05:49 AM
Hello Eyal, thanks once again for your interest in helping out. I have not expressed myself correctly. I believe there is no backpressure, as I confirmed visually on the nifi UI. On the upper banner I see there are globally about 5000 flowfiles circulating at a particular time. I see some of the queues have 20, 50, 100, 200 flowfiles waiting at an instant. The queue that has more queued flow files is the kafka producer processor, that has about 2000. I believe these quantities are not sufficient to trigger backpressure-caused adjustment of scheduling of tasks. Apache Benchmark sends concurrent http requests. In this case, i set each of the 5 client instances to send each 1000 requests concurrently. I have thread pool sized to 96 and assigned 16 concurrent tasks to all of the critical path processors. Despite this, I get a low core load average (between 2 and 7), the disk's IOPS are in the dozens per second (the maximum that the disks allow is 3000). I'm all out of ideas at this point. 😄 Thanks for the support!
... View more
12-09-2022
07:20 AM
Hi Eyal, I went through the response you shared. I am running a cluster with 16 core cpus, so i set the maximum thread pool size to 96. The HandleHttpRequest and Response both have 16 concurrent tasks. I tried to assign 24 to each of them but got no performance increase so I lowered it again. Another question that was dealt in the post was the load balancer, which in my case seems to be doing the job of distributing load and not dropping requests. In summary, I'm puzzled by the low CPU, and disk utilization. However I cannot seem to understand how to make better use of the resources, as when i increase the number of concurrent tasks there is no significant change. Greetings, Samuel Greetings, Samuel
... View more
12-09-2022
07:05 AM
Hi there Eyal, I am using 5 clients with concurrency level of 1000 simultaneous requests per client and at a particular time there are about 5000 flowfiles being routed. This means that there are several dozen requests or a couple hundred requests per queue at a particular time. So in my opinion there is no backpressure being applied (the queues are configured to start backpressure when 20k requests are queued). I am not sure about the HandleHttpRequest... it could be... but requests are not being dropped, as during the tests i only see several hundred failed requests in 500 000. I will try to apply the reccomendations in the topic and will report. Regarding the load balancer, it is a classic load ballancer in the aws cloud, and offers little configuration apart from security, target groups, etc. At one time it was dropping requests, in the first tests i did. I could not wrap my head around why. But then again, it is not happening now, and it offers limited possibility of configuration. Thanks again for your help. Greetings, Samuel
... View more
12-09-2022
02:36 AM
Hello Eyal., thank you for taking the time to respond, it has been helpful in learning more about what works in nifi processors. Increasing the Run Duration of the log processor eliminated the queued flowfiles and that was good. The baseline throughput of the system is about 5600 req/s before any change. The installation includes 6 t5.4xlarge ec2 instances (each has 16 threads) I have been trying to change the flow but also trying to tune the nifi configurations. About the changes to the flow: 1. Parsing query parameters I used the http.query.param.YYY attributes with the AttributesToJson processor instead of parsing the http query string. This replaced 6 processors by 1 and the performance gain was about 5% (registered aprox 5800 req/s) 2. When http.query.param.ETID = 11 and http.query.param.HHID is in (0,'0','') i affected http.query.param.HHID with a special value that it should not have (it is supposed to contain either integers of numeric characters), and then removed http.query.param.HHID. Then I applied the jolt expression that you shared. Here, because the approach is probably not the best, there was no significant performance gain. 3. I used the queryrecord processor to test if a field is numeric with an sql query, and in that case routed it to the error processing I used a select * from Flowfile with a where clause contrary to the first to route it to non-error processing. Here, the performance gain was also about 5%. So up to this point I there were about 6,1k events/second, no major breakthrough acchieved that can be attributed to changes in the nifi flow. I consider this far from what i expected. As to the nifi configurations tuning: So the first thing i did was changing the provenance repository to in-memory and that had a good impact on the performance. A gain of about 600 req/s. At this point I had 3-node cluster and added 3 more nodes. The first tests that i did i was well impressed, i got consistently above 7700 req/s. At one point I got about 14200 req/s. I could never reproduce the results again, as subsequent tests registered a decrease in performance. I saw some posts that seemed to describe a situation similar to mine. A cluster that had good performance, then got worse results. The tickets indicated that there could be a garbage collector interference, and reported that the initial throughput was acchieved again after restart of the service. This did not happen in my case. The performance degraded. I tried to change some of the configurations in the nifi.properties safety valve following this post: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 I changed all the properties to the indicated values and got no performance increase. In summary, the system was performing at 5600 req/s with 3 nodes. After adding 3 nodes it got above 7700, sometimes above 9000 and once at 14200 req/s. Now it is performing at 6600 req/s with 6 nodes. I get the idea that the degradation in performance per node is not to be attributed to the changes i did to the flow, but do not have sufficient information to identify a cause. Greetings, Samuel
... View more
11-23-2022
04:01 AM
Hello I'm Samuel and I work on Miguel's team, there is a single point of the flow where a queue is filling, and backpressure is being applied: a logattribute to register the ips of the incoming requests: Apart from that, there are roughly 5000 flowfiles flowing in the system at a given moment when were processing a steady flow of events, which given the fact that end-to-end latency is approx 1second, seems acceptable. The scenario in which we are obtaining these results is a development environment, where we are performing stress tests using apache benchmark. So the production rate is steady. The requests are directed at a load balancer. The failed requests seem to originate from that load balancer, which is filling its queue, and dropping incoming requests when that happens. 1.UpdateRecord I could not find a freeform text processor. Here is an example of the incoming data in the form of query parameters: TS=20221108-093700&UID=12345&ETID=22&EGID=44&DID=Hello%20World!&HHID=492489&OID=666 What reader could be used to transform free text to json? 2. transforming null values: an example of the incoming data: { "TS": "20221108-093700", "UID": 12345, "ETID": "22", "EGID": 44, "DID": "Hello World!", "HHID": 0, "OID": 666 } when event type id is 11, and the householdid is either 0,"" or " ", we want to change the HHID to null I could not find a way to change the value to null and cannot evaluate one field and transform another. Is it possible to transform a value to null (the value null and not the string "null")? What is being done wrong in the example below? [ { "operation": "shift", "spec": { "TS": "TimeStamp", "UID": "UserID", "OID": "OpCoID", "ETID": "EventTypeID", "EGID": "EventGroupID", "DID": "DeviceID", "HHID": { "0": { "#null": "HouseholdID" }, "*": { "@(2,HID)": "HouseholdID" } }, "*": "Custom.&" } }, { "operation": "default", "spec": { "Custom": {} } } ] 3. UpdateAttribute + RouteContent -> QueryRecord We're doing control flow: evaluating if a given field is numeric, then we're seeing whether there was any error, and based on the messageError attribute I'm routing to success and error. With the query we can apply a transformation but cannot decide based on the evaluation of the sql query. Is this interpretation correct in your opinion? Greetings
... View more