Created 12-08-2016 02:11 PM
Hello,
I have been trying to pull some data from our SQL Server into hdfs via sqoop. The destination point is an encrypted zone (/secure/). The files are written and when I pull the files with hdfs dfs -get /secure/[folder imported] I am getting gibberish when I open the files. My first though was I couldn't decrypt the file, but when I look at the audit logs in Ranger, I am seeing the access type decrypteek for my user on the read and the write. Below is the sqoop query. Any insights would be great.
sqoop import \
-D sqoop.test.import.rootDir=hdfs://popul/secure/ \
--target-dir hdfs://popul/secure/intest/ \
--connect "jdbc:sqlserver://[serverip]:1433;database=[database]" \
--username [sqoopuser] \
--password [password] \
--table S_Elg \
--fields-terminated-by "|" \
--columns "col1, col2, col3" \
--split-by ElgKey \
-- --schema ACC
P.S. when I run this query in a non encrypted zone, everything works as expected.
Nick
Created 12-21-2016 02:47 PM
After staring at this with a hortonworks engineer (who was onsite for an unrelated reason), we figured out the problem. The whole time Ranger KMS was doing its job, but I had enabled compression on my mapper outputs with these changes:
mapreduce.map.output.compress | true |
mapreduce.output.fileoutputformat.compress | true |
When I pulled the outputs of my sqoop job they looked like binary, but in reality they were just compressed. After deflating them everything is working correctly.
Nick
Created 12-17-2016 08:14 PM
Can you verify the user trying to read the file has the Decrypt EEK permission in Ranger KMS? You can use my article here as a reference
Is your cluster Kerberized?
Created 12-21-2016 02:47 PM
After staring at this with a hortonworks engineer (who was onsite for an unrelated reason), we figured out the problem. The whole time Ranger KMS was doing its job, but I had enabled compression on my mapper outputs with these changes:
mapreduce.map.output.compress | true |
mapreduce.output.fileoutputformat.compress | true |
When I pulled the outputs of my sqoop job they looked like binary, but in reality they were just compressed. After deflating them everything is working correctly.
Nick