After installation of HDP on my server I ran a test query. I executed the query on zeppelin on hive table. On spark standalone the query took 7 minutes. Then I added two nodes to the cluster and run the same query on yarn cluster, which took around 30 minutes.
Is this a normal behavior? How can i tweak the setting to get improved running time on the cluster.
@Partha Deb The question is kind of broad and varies from cluster to cluster based on the size and hardware, n/w and many more settings.
Could you please provide us some insights about your cluster size and the environment in which it is set up? Also, hardware detail will also help to judge the same.
Thanks for the reply.
My master has 8 cores and 16 GB RAM. My two slave nodes has 4 cores and 16 GB each.
While running the query of standalone mode(spark2 standalone) without the slave nodes it takes 7 minutes, whereas in cluster mode (yarn cluster) it takes 30 minutes.
I guess I am missing something here, as there is noticeable performance degradation in cluster.