I'm a Hadoop newbie and an Infrastructure architect. After reviewing several reference architectures and speaking to some consultants, we've built a Hadoop "PoC" cluster comprising 2 Master nodes, 1 Edge node and 4 worker nodes - all bare-metal RHEL 7 deployments. The Master and Worker nodes connect to a private (non-routable) cluster network via 20 Gbps bonded interfaces and a Management network (1 Gbps). The Edge node connects to the Corporate Network (applications and users) via a 20 Gbps bonded interface tagged with a VLAN (Corporate Network) and to the private network via another 20 Gbps bonded interface. The rationale behind this deployment (as per the various reference architectures) is to isolate the Hadoop cluster nodes (to do what they do best - store and process data) and allow any client access (application and/or user) only via the Edge node.
However, we're now being advised that this architecture will not work for all components (E.g. spark, druid, sqoop,Hive) as these clients must access the workers directly and cannot go through an edge node. Can you please advise? Is there any client (application/user) access that must go directly to the workers and will not work via the Edge nodes?