Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SSL in Hadoop

avatar
New Contributor

Community,

I am implementing security as a part of POC on our cluster(HDP 2.3.4). I have few queries regarding below points, please feel free to share your thoughts on this.

1. I would like to enable SSL encryption for all the services (HDFS/YARN/Hive/Sqoop/Oozie) that I use in my cluster. Would it be possible to use a single cert for all the services?

2. What are the performance benefits of using third party authorized certs over self signed certs?

3. If third party certs are preferred,

3.1 Which certs are preferred among internal and external third party certs?

3.2 How would the certs from different vendors vary? Are there any specific standards that needs be met which purchasing the certs?

Regards,

Balaji

1 ACCEPTED SOLUTION

avatar

@Balaji M It is possible to use a single certificate for the entire cluster. You will need to make sure that the certificate is valid for a range of hosts and not host specific to make this work.

The preference for 3rd party vs. self-signed certs will really depend on your security policies within your organization. Third party Certificate Authorities (or CA, e.g. Verisign) are used for purposes of trusting the public key contained in the certificate. They are a "clearing house" that tells you that the public key has been validated and verified to belong to the owner. The difference between 3rd party certs is the level of trust with the CA. For internal uses, where you trust the CA signing the cert and don't need to ask other users to trust the certificate, then there really is no difference between the certificates. Likewise, if you are using the certificate for strictly internal purposes, then there is no difference between a 3rd party signed cert and a self-signed cert. Many organizations maintain an internal CA to generate their own certificates so they don't have to pay a 3rd party to sign the certificates. When installing a self-signed cert or a non-famous CA signed cert, you will need to set up trust for the certificate and/or CA when installing the certificate on the system.

Because the process for verifying and trusting the certificate does not vary for 3rd party signed certs vs. self-signed, there is really no performance difference for the type of certificate used. There will be a fairly significant performance penalty for using SSL and certificates for all of the inter-process communication due to the overhead of verifying the certificate, decrypting/encrypting the traffic, etc. This performance penalty will depend on your workload, but I've seen reports of up to 15-20% performance penalty when enabling wire encryption.

Because of the complexities of certificate management, performance impacts, etc., you should design the security of your system with all aspects in mind. For example, at-rest data encryption for sensitive data keeps the data encrypted on the wire until it is encrypted/decrypted by the client. Security is a complex topic, so be aware that there may be more than one way to meet your needs.

View solution in original post

2 REPLIES 2

avatar

@Balaji M It is possible to use a single certificate for the entire cluster. You will need to make sure that the certificate is valid for a range of hosts and not host specific to make this work.

The preference for 3rd party vs. self-signed certs will really depend on your security policies within your organization. Third party Certificate Authorities (or CA, e.g. Verisign) are used for purposes of trusting the public key contained in the certificate. They are a "clearing house" that tells you that the public key has been validated and verified to belong to the owner. The difference between 3rd party certs is the level of trust with the CA. For internal uses, where you trust the CA signing the cert and don't need to ask other users to trust the certificate, then there really is no difference between the certificates. Likewise, if you are using the certificate for strictly internal purposes, then there is no difference between a 3rd party signed cert and a self-signed cert. Many organizations maintain an internal CA to generate their own certificates so they don't have to pay a 3rd party to sign the certificates. When installing a self-signed cert or a non-famous CA signed cert, you will need to set up trust for the certificate and/or CA when installing the certificate on the system.

Because the process for verifying and trusting the certificate does not vary for 3rd party signed certs vs. self-signed, there is really no performance difference for the type of certificate used. There will be a fairly significant performance penalty for using SSL and certificates for all of the inter-process communication due to the overhead of verifying the certificate, decrypting/encrypting the traffic, etc. This performance penalty will depend on your workload, but I've seen reports of up to 15-20% performance penalty when enabling wire encryption.

Because of the complexities of certificate management, performance impacts, etc., you should design the security of your system with all aspects in mind. For example, at-rest data encryption for sensitive data keeps the data encrypted on the wire until it is encrypted/decrypted by the client. Security is a complex topic, so be aware that there may be more than one way to meet your needs.

avatar
New Contributor

@emaxwell Thank you for sharing some useful information.