We are considering implementing security in a progressive manner, on our HDP data lake:
1- At a first stage, basic authentication would comprise HDFS users, groups and ACLs.
2- After that we would incorporate AD/LDAP.
3- Final destination would be to add Kerberos + Knox.
Between each two steps, data lake will continue to group steadily, incorporating new feeders.
The question is: which are the caveats of proceeding in these three separated, progressive steps? Does this approach pose any complications to existing components (Hive, Spark, Kafka, Atlas, Ranger) and data stores?
Thanks in advance!
I would recommend starting with the most complicated first which is Kerberos. We use Kerberos, HDFS user/group permissions, AD, Ranger and Knox.