Why we need to STOP the firewall servies while installing CDH or using CDH. Cann't we use without stopping the firewall service.
Stop the firewall means, opening server access to the world. So is there any workaround to use CDH with firewall service
There could be many reasons but I believe this is one of the important reason:
As we know HDFS is a Distrubuted File System which interacts with multiple nodes and maintains namenode, datanode (including replication factor), secondary nn, etc accross the nodes. All thease daemons should be interacted in a frequent interval and the interval time is very less (Ex: DataNode sends heart beat for every 3 seconds to namenode). Any dealy in the response will be considered as an issue/failure and look for alternate and make sure the connection is consistent
With all these conditions, if daemon interacts each other via firewall will be an extra burden to HDFS and it may create unnecessary confusion... So switch off firewall will avoid this.
Your next point is stopping firewall will open server to the public. It is true but there are so many security methods are available like (AD, LDAP, Kerberos, Sentry, ACL, Rest Encryption, etc). So the environment will be safe
Cloudera recommends to not use a firewall. But this doesn't mean it is technically impossible to do.
For example, my customer is using iptables in production.
Also, depending on your network installation/architecture you can handle the firewall with the "external" world using the router/switch configuration without depending on a software firewall which could put an extra load on the nodes.
If you understand all the drawbacks of using a firewall yes (might be worth discussing it with Cloudera ?). But for example I would advise you to allow all network traffic inside the cluster. And only filter the traffic with the external world.
As for Kerberos : Yes, its purpose is to secure the access to your cluster (authentification part).