Is it recommended to install all clients in all Data nodes?
Clients / HCat Client , HDFS Client ,Hive Client , MapReduce2 Client , Oozie Client , Pig Client , Slider Client ,Spark2 Client , Tez Client ,YARN Client , ZooKeeper Client
Clients are very light weight components and can be installed on all datanodes without any issue.
However all those client installation might not be needed on all the DataNodes. It is based on your requirement you can choose which Node should have which client.As a best practice for a production environment, clients need not be installed on data nodes nor name nodes. In an ideal situation you should use edge nodes to install Client packages. This is better from security and workload distribution perspective, because we don't want various users accessing directly data nodes or running scripts on DataNodes or NameNodes.
You can add your edge node to the cluster via Ambari and install only required client packages. If you install hdp clients/kerberos manually then it is difficult for you to keep it up to date with the latest configuration, Hence better to manage them via Ambari.
All these packages are not required for the DataNodes but these are required for the hosts that are hosting the NodeManagers, as the Containers launched on the NodeManagers does require the client services to be available.
Make sure Sqoop client is added to all the NodeManager and Oozie Client hosts which is the basic need for the SQOOP job.