I just wanted to know the steps on how allocation of space works. Here's the scenario.
Assuming I have 3 clusters with 1TB of space. We have the default replication factor which is 3.
Assuming we have 3 clients who will share the in the clusters. Now, how does the allocation of space works with these. I know it depends on the requirements of the clients on how much storage will they be using but with the current setup, what would be the steps/procedure or calculation should be applied?
@Bruce Perez, HDFS does not allocate capacity separately based on user. However, it is possible to use HDFS Quotas to enforce a limit on metadata consumption and space consumption by specific directories. A common setup is to create sub-directories dedicated to different users, apply HDFS Permissions on each directory to guarantee that only that user can write to the directory, and then set an appropriate quota on each directory. The permissions would guarantee that the user can only write to their directory. The quotas would limit metadata and space consumption by each user. The overall effect of this setup is that in a multi-tenant cluster, it prevents any one user from consuming all space in the cluster and harming processes of its other users.
You can enable Kerberos authentication even if your users are present in AD/LDAP; There wont be any changes in Kerberos security.
AD/LDAP integration is NOT a substitute for internal enforcement of authentication.
Basically if there is no kerberos, you cannot prevent users from impersonating any other hadoop user and doing 'anything they want'.