While i was installing HDP 2.5.3, in the 'install services' section, i was asked to select below services on nodes in the cluster. But am not sure about below services to be installed on what node(Master, data, edge).
1. Phoenix Query Server
4. Accumulo TServer
5. Livy Server
6. Spark Thrift Server
Could any one clarify, the services to be installed on what type of node and on how nodes does the service needs to be running.
The answer here depends heavily on what services you need, what hardware is available, and how frequently you will use them.
Flume Agents are minimal and mostly collect logs.
Livy is just a web API for Spark, but it does maintain SparkContexts and starts with 2GB heap by default.
Supervisor is a Storm process. (I don't know much about Storm)
Spark, Phoenix Query, and Accumulo Thrift Servers should ideally be separated for the respective query processing. Install multiple of each to provide failover.
If you are limited by servers, then use your best judgement about what is the most critical piece of your architecture, then set explicitly dedicated hardware pools to that. For the rest, as long as you have the available cpu/memory/disk to run additional processing with little overheard, then you can combine services with minimal impact.