Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Run Spark Thrift server on multiple nodes

avatar
Rising Star

Currently I running only one Thrift server on my cluster.

I see if I have many client connections, Spark run very slow.

I find the solution for this and I see the doc: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/install-sp...

It said: Deploying the Thrift server on multiple hosts increases scalability of the Thrift server; the number of hosts should take into consideration the cluster capacity allocated to Spark.

So If I deploy more Thrift server on my cluster, Can I handle multiple concurrent client connections ? If not, how to I can handle this issue ?.

And other question, Can I running two Spark apps with different configure on each?

1 ACCEPTED SOLUTION

avatar
Explorer

Hoang,

I think what that post is saying is while there is not yet a load balancing solution for STS on a secure cluster, in a non-Kerberos environment any external load balancer, like haproxy, or httpd +mod_jk can be used to distribute requests. The Thrift servers will be independent (no Zk coordination) and the load balancer will simply round robin client requests.

View solution in original post

5 REPLIES 5

avatar
Rising Star

I think you need load balancing for Spark Thrift servers.

Why don't you refer this link?

'https://community.hortonworks.com/questions/29687/how-to-do-load-balancing-spark-thrift-servers-on-h.html'

avatar
Rising Star

@youngick kim

I refer that link suggest, but I can not see any guide to solve my issue.

Best answer saild :

So for now... no load balancing for STS if the cluster is kerberized, otherwise haproxy, httpd +mod_jk or any other load balancer will probably do the work.

P/S: Currently, my cluster do not enable kerberrized

avatar
Explorer

Hoang,

I think what that post is saying is while there is not yet a load balancing solution for STS on a secure cluster, in a non-Kerberos environment any external load balancer, like haproxy, or httpd +mod_jk can be used to distribute requests. The Thrift servers will be independent (no Zk coordination) and the load balancer will simply round robin client requests.

avatar
Rising Star

So I need use third party service like Haproxy to running Thrift server load balancing on HDP

avatar
Explorer

yes, that is my understanding.