Is it mandatory to use HSM for implementing data at rest in the production environment?
No. The Cloudera KeyTrustee key store doesn't require an HSM. Ranger KMS can do the same on the HDP side.
When we write files into the encryption zone, there is an EDEK generated for each file by the KMS, is KMS not able to handle the burden?
.See this doc for the EDEK creation flow. The default is to have two KMS's for HA: but, you can add more if performance is a concern. Note that wide Impala tables can place more of a buren on the KMS.
Is scalability the only issue with KMS or there are any other problems that we might run into while using KMS for Data at Rest Encryption?
You want to make sure the KMS ACL blacklist/whitelist is configured properly. Best practice is to disable HDFS admin access to enable separation of duty between the Key Manager role and the HDFS Admin.
Can hive read and write data into an encryption zone without any configuration changes? What about the other AD users/Zepplin who have access to reports build on hive?
Assuming you have the hive user whitelisted , which the wizard does automagically, and Sentry enabled with proper ABAC it should be transparent.
The encryption zones need to be created with hdfs admin account and not the hdfs service account which is created during the cluster setup. What if the service user has access to the dfsadmin?
Probably not a good idea from a separation of responsibility.