We have a ambari cluster with the following details:
3 masters machines
5 kafka machines
160 workers machines ( data node machines )
on one of the master machines we installed the presto coordinator
and on all the 160 workers machines we installed the "Presto worker" , presto cluster with 160 worker nodes.
presto coordinator installed on VM machine ( 32G + 16 CPU )
How we do the sizing for presto coordinator (memory,cpu)?
What is the best practice sizing formula for the coordinator machine?
Dose presto coordinator can handle and manage 160 workers machines ? ,
Coordinator The Presto coordinator is the server that is responsible for parsing statements, planning queries, and managing Presto worker nodes. It is the “brain” of a Presto installation and is also the node to which a client connects to submit statements for execution. Every Presto installation must have a Presto coordinator alongside one or more Presto workers. For development or testing purposes, a single instance of Presto can be configured to perform both roles.
The coordinator keeps track of the activity on each worker and coordinates the execution of a query. The coordinator creates a logical model of a query involving a series of stages which is then translated into a series of connected tasks running on a cluster of Presto workers.
Coordinators communicate with workers and clients using a REST API.
Worker A Presto worker is a server in a Presto installation which is responsible for executing tasks and processing data. Worker nodes fetch data from connectors and exchange intermediate data with each other. The coordinator is responsible for fetching results from the workers and returning the final results to the client.
When a Presto worker process starts up, it advertises itself to the discovery server in the coordinator, which makes it available to the Presto coordinator for task execution.
Workers communicate with other workers and Presto coordinators using a REST API