Datanode java heap is set to : 1.5 GB We have 59686 blocks on cluster. The spark script is doing an ETL operation on a constant ingestion from Flume. So the input is allways varying, let us say 4 to 6GB per hour. And we have a block = 256MB. Thanks for your advice on Heap size. We will try that out. But, we have another finding on which a coment will be appreciated. Out of 3 datanodes, only one is showing more CPU load. All the slow performance is caused by this one slow node. How do you insist we should tackle this behaviour?
... View more