I am doing some evaluation of HDC and I am looking to spin up EDW-Analytics:Apache Hive 2 LLAP, Apache Zepplin 0.7.0 in HDP 2.6 (Cloud)
I would like to know the difference in configuration between worker and compute nodes. The reason I ask is that I want to take advantage of spot pricing and I am not that concerned if I loose the nodes during my testing phase. However I would like to understand if these nodes are configured differently.
Worker nodes and compute nodes contain the same services. The basic advantage of compute nodes is, that if you want to use spot priced instances than you don't have to be afraid of losing any data because those nodes are only for compute purposes. You can also shrink down your compute group to 0 instance after the creation of the cluster.
@Sharon Kirkham Compute nodes give you the ability to include nodes in the cluster that are just for compute work (i.e. no DataNode). This is similar to a “Task” type node in the EMR world.
Here are screenshots for EDW-Analytics:Apache Hive 2 LLAP, Apache Zepplin 0.7.0 from Ambari illustrating the components installed on worker and compute nodes:
The worker node has DataNode, while the compute node does not.