Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Confusion in documentation : Configuring Spark for Wire Encryption?

avatar
Expert Contributor

Hi all,

I was going through the latest documentation on Hortonworks website : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-encry...

I am unable to understand the following line :

-

Configuring Spark for Wire Encryption

"You can configure Spark to protect sensitive data in transit by enabling wire encryption. Spark supports SSL for broadcast and file server protocols, and it uses SASL encryption for the block transfer service. Note, however, that wire encryption is not yet supported for shuffle files, cached data, and other application files."

- Protect sensitive data in transit by encryption ( seems to be for data ingestion part but how? From Kafka?)

- park supports SSL for broadcast and file server protocols,.... (OK)

-however, that wire encryption is not yet supported for shuffle files, cached data, and other application files.".

So where do I get data encrypted and where data is secured and unsecured during the start of the job to execution is finished?

Can someone please enlighten on this?

BTW: In Spark's context, where wire encryption comes into picture?

Many thanks,

1 ACCEPTED SOLUTION

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

Thanks @bikas, @lgeorge

Does it mean configuring "Configuring Spark for Wire Encryption" from the documentation http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-encry... we will get "data encryption" for the data which is moved inside the network between the single job (between executors) as parts of the tasks. (I.e. during the internal transit of the data among the nodes inside the cluster).

Does "wire encryption for Spark" touch other avenues/benefits also?

Many thanks,

SS

avatar
Super Collaborator

Yes. It means encrypting all network transfers within the Spark job. There are no other avenues for wire encryption within Spark. Starting Spark 2.0 enabling wire encryption also enables https on the history server UI for browsing historical job data.

avatar
Super Collaborator

@Smart Solutions there is also some related info for Apache Spark version 1.6.2 (shipped with HDP 2.5) at https://spark.apache.org/docs/1.6.2/security.html#encryption.

avatar
Super Collaborator

The HDP Spark Component Guide (versions 2.5.0+) has been updated per Bikas's clarification,

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-encry...