Reduce NiFi latency to nearly 1 s

I use a 3 cluster NiFi node A,(on premise, 8core,10,5 global thread setting) and connecting via Site to site(https) to a single node NiFi B instance in EC2.(standalone). I have a simple flow transferring data from kafka to the NiFi A, then from there to NiFi B then to SNS. Lineage duration in output S2S port of A takes around 20 seconds. After some improvements, it came to 9 secs. How to achieve lower latency, is the time measurement approach wrong. How to measure stats for high load? Is NiFi doing parallel processing or some microbatching happening that is cause of the delay?