I was wondering why we need to have encryption in transit while TDE has been provided already?
As far as I understand, TDE is an end-to-end encryption mechanism so that HDFS data will be encrypted/decrypted at the client side. Therefore, data will be encrypted while transferring the wire from client to server and server to client. In this case, what would be the added value of using encryption in-transit separately?
Your understanding is correct, as far as that HDFS TDE is designed to provide end-to-end encryption and does provide confidentiality of data in-transit for some access paths. However, additional wire encryption mechanisms may be needed to provide confidentiality of data in-transit for clients accessing the data from outside of the secured zone within which the cluster network resides.
For example, let's say we are connecting from our BI application, installed on a workstation, to HiveServer2 via Knox (using HTTP transport mode for Hive). In this case, what client is actually the HDFS client? The HDFS client would be the principal associated with HiveServer2 that is reading from HDFS (let's assume impersonation is disabled for Hive, which is the best practice in deployments in which Ranger is used for authorization).
This provides confidentiality within the cluster network, but we would still want to secure our connectivity between Knox and the workstation using TLS by communicating over HTTPS.