What are best parameters to tune Spark when number of unique keys generated after map function is in billions? I have 5 nodes cluster where each node has 8 cores i7 processor and 8GB RAM? My input data size is 10.2 GB. After map function is done, the amount of intermediate data generated is around 40GB which will have 45 millions unique keys. I have to basically count number of occurences of every unique key.