Support Questions
Find answers, ask questions, and share your expertise

DATA INGESTION AUTHORIZATION

Explorer

can we restrict data ingestion on data nodes?

I want to have a sub group of data nodes where i can place data from source 1 and another groups where i can place data from source 2. doesn't matter how closest or far. So a bunch of nodes--say 1,2,3,4 can not store data from source 2. (any way to define such a policy in hadoop). just if we can have any kind of access control on data ingestion part.

Reply is very appreciated.

@ Scott Shaw

@ Vipin Rathor

@ Jitendra Yadav

@ Timothy Spann

@ Artem Ervits

3 REPLIES 3

Expert Contributor

Is there a particular reason why you want to do this?

Algorithm for replica block placement is written in NameNode. Given replication factor of 3. First block is placed on local node, second and third on a different rack random nodes. Application or ingestion does not decide where the blocks go.

https://community.hortonworks.com/questions/31144/in-hdfs-block-placement-how-is-closest-defined.htm...

If you really really want to change replica block placement you can do that by writing/coding pluggable interface for HDFS:

https://issues.apache.org/jira/browse/HDFS-3601

http://stackoverflow.com/questions/14494179/modifying-the-block-placement-strategy-of-hdfs

Explorer

I want to have a sub group of data nodes where i can place data from source 1 and another groups where i can place data from source 2. doesn't matter how closest or far. So a bunch of nodes--say 1,2,3,4 can not store data from source 2. (any way to define such a policy in hadoop)

just if we can have any kind of access control on data ingestion part.

Expert Contributor

There is no way to define a policy in HDFS to allocate or specify which blocks go where.

More on the use case you are trying to handle.

If you are concerned about data security than policies can be managed in Ranger on certain directories regardless of which nodes your blocks reside

If you are trying to manage quotas per application/source, that can be applied by HDFS quota at directory level.

Goes back to what is your real world use case on why you need designated nodes to store blocks.