Support Questions

Find answers, ask questions, and share your expertise

SSL Configurations for HSFTP Sources

avatar

Hi,

I am trying to transfer HDFS files securely between two clusters using

hadoop distcp hsftp://<host1>:50470/srcPath hdfs://<host2>:8020/destPath.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Sys_Admin_Guides/content/ref-bbf68907-3b...

"HSFTP, uses HTTPS by default. This means that data will be encrypted in transit"

Source Cluster is made secure with ssl setup on all nodes and dfs.http.policy is set to HTTP_AND_HTTPS .In destination cluster we have truststore of source cluster.

I understand that Distcp hsftp command when we run on destination cluster, it talks to source name node on 50470 port which is secure. Does that mean actual data transfer between data nodes is also secure? If so, can someone explain me how it works .

1 ACCEPTED SOLUTION

avatar
@subacini balakrishnan, HTTP calls to both the NameNode and the DataNode will utilize SSL. Since it utilizes SSL for the data transfer performed with the DataNode, the bytes in transit are encrypted and cannot be read by a man-in-the-middle attacker.

The way this works is that the HTTP client first initiates a call to the NameNode using either the "http" or "https" scheme. For a file read or write operation, the NameNode will select an appropriate DataNode and send an HTTP 302 redirect response back to the client telling it to reconnect to that DataNode to complete its request. When the NameNode performs this redirect, it detects the scheme of the incoming call that was sent to it and preserves that scheme in the Location header of the HTTP 302 redirect response. Thus, for a request originating at the NameNode via "http", the redirection will point to an "http" URL on a DataNode, and for a request originating at the NameNode via "https", the redirection will point to an "https" URL on a DataNode.

View solution in original post

3 REPLIES 3

avatar
Guru

When data is being transferred from secure to unsecure cluster via distcp. User will require to set ipc.client.fallback-to-simple-auth-allowed=true on secure machine otherwise distcp operation will fail with permission error.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ref-c8ffaa14-ea...

When ipc.client.fallback-to-simple-auth-allowed is set to true, hdfs client switch to SASL SIMPLE (unsecure) authentication.

avatar

Hi Yvora,

I didnt set this property and didnt face any permission issue. We are using hsftp and captured packets during transit. Data is encrypted and communication is happening over secure ports [50470, 50475 ]. Please confirm.

avatar
@subacini balakrishnan, HTTP calls to both the NameNode and the DataNode will utilize SSL. Since it utilizes SSL for the data transfer performed with the DataNode, the bytes in transit are encrypted and cannot be read by a man-in-the-middle attacker.

The way this works is that the HTTP client first initiates a call to the NameNode using either the "http" or "https" scheme. For a file read or write operation, the NameNode will select an appropriate DataNode and send an HTTP 302 redirect response back to the client telling it to reconnect to that DataNode to complete its request. When the NameNode performs this redirect, it detects the scheme of the incoming call that was sent to it and preserves that scheme in the Location header of the HTTP 302 redirect response. Thus, for a request originating at the NameNode via "http", the redirection will point to an "http" URL on a DataNode, and for a request originating at the NameNode via "https", the redirection will point to an "https" URL on a DataNode.