Created 05-13-2016 02:05 PM
Is there a library or call that already defines either AES or similar strong encryption methods?
I was looking for a way to do decryption of a HIVE table using SPARK and then load only certain tables in via a protected notebook in Zepplin or such.
Created 05-13-2016 06:26 PM
If you mean a way to decrypt a file that has been encrypted with HDFS encryption, then no. The encryption and decryption with HDFS as-rest encryption is more complex. The EEK is stored with the file, and you have to talk to the KMS to get the decrypted key, etc. You can use HDFS encryption with Hive and Spark to take care of this for you.
If you want to generate a key pair and use that for both Hive and Spark to encrypt/decrypt data, that can be done, but would be part of loading and working with the data. You'd need to define a UDF for Hive to use for decryption so you could reference it with a select statement, and you'd need to use libraries in Scala or Python for Spark to decrypt the data. Both would have to have access to the keys for decryption, though, and that may be difficult to architect in a secure fashion.
Created 05-13-2016 06:26 PM
If you mean a way to decrypt a file that has been encrypted with HDFS encryption, then no. The encryption and decryption with HDFS as-rest encryption is more complex. The EEK is stored with the file, and you have to talk to the KMS to get the decrypted key, etc. You can use HDFS encryption with Hive and Spark to take care of this for you.
If you want to generate a key pair and use that for both Hive and Spark to encrypt/decrypt data, that can be done, but would be part of loading and working with the data. You'd need to define a UDF for Hive to use for decryption so you could reference it with a select statement, and you'd need to use libraries in Scala or Python for Spark to decrypt the data. Both would have to have access to the keys for decryption, though, and that may be difficult to architect in a secure fashion.