Mostly it sounds like your upstream spouts/bolts are outpacing your downstream bolts. Try throttling your spouts with topology.max.spout.pending settings and try to increase the topology’s parallelism. Also based on your JVM profiling find out if you are GC’ing too much which can cause tuples failures. Try increasing JVM heal size (-xmx) that is allocated for each worker. You may have to use explicit G1 garbage collector. This may have negative impact as GC pause may take longer for larger heaps.
I do see lot of conflicting GC arguments. Can you please try with Xmx 2048 only and remove all the exlicit GC arguments first. Then you need to tune the topology.max.spout.pending parameter with some trial and error to find the correct value.