Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Query running on cluster takes more time than running on standalone?

Highlighted

Query running on cluster takes more time than running on standalone?

New Contributor

After installation of HDP on my server I ran a test query. I executed the query on zeppelin on hive table. On spark standalone the query took 7 minutes. Then I added two nodes to the cluster and run the same query on yarn cluster, which took around 30 minutes.

Is this a normal behavior? How can i tweak the setting to get improved running time on the cluster.

2 REPLIES 2

Re: Query running on cluster takes more time than running on standalone?

Contributor

@Partha Deb The question is kind of broad and varies from cluster to cluster based on the size and hardware, n/w and many more settings.

Could you please provide us some insights about your cluster size and the environment in which it is set up? Also, hardware detail will also help to judge the same.

Highlighted

Re: Query running on cluster takes more time than running on standalone?

New Contributor

Thanks for the reply.

My master has 8 cores and 16 GB RAM. My two slave nodes has 4 cores and 16 GB each.

While running the query of standalone mode(spark2 standalone) without the slave nodes it takes 7 minutes, whereas in cluster mode (yarn cluster) it takes 30 minutes.

I guess I am missing something here, as there is noticeable performance degradation in cluster.

Don't have an account?
Coming from Hortonworks? Activate your account here