Support Questions

Find answers, ask questions, and share your expertise

Hadoop data linking from multiple sources

avatar
Expert Contributor

Data comes from multiple sources and these are exposed in the hive table for the users. A specific column is sensitive and needs to be given restricted access. If a user who wants to join 2 such tables on the column that he does not have access to, then whats the best approach to make it work? One option is to link the sensitive column with a generated key so that the user can join on the generated key. Is this a good idea or any better idea?

1 ACCEPTED SOLUTION

avatar
Master Guru

You could create a hashed column for that key and choose the hash algorithm in a way that a hash collision is very unlikely. However I don't get the usecase completely. Couldn't you just create a new View that already joins the two tables together and only give access to the resulting columns?

Table A ( customers web site a )

Name, Address, CreditCard

Table B ( customers web site b )

Name, Address, CreditCard

Create View C as

SELECT NameA,AddressA, NameB, AddressB from A,B where a.CreditCard = B.CreditCard;

And only give access to that view. I know this doesn't give the same flexibility but you do not need to do the whole hash thing.

If more flexibility is desired then your proposed approach of adding a masked column to both tables would be the way to go. I would think something like sha2 or the aes_encrypt function should provide a way to be very secure in avoiding hash collisions.

View solution in original post

2 REPLIES 2

avatar
Master Guru

You could create a hashed column for that key and choose the hash algorithm in a way that a hash collision is very unlikely. However I don't get the usecase completely. Couldn't you just create a new View that already joins the two tables together and only give access to the resulting columns?

Table A ( customers web site a )

Name, Address, CreditCard

Table B ( customers web site b )

Name, Address, CreditCard

Create View C as

SELECT NameA,AddressA, NameB, AddressB from A,B where a.CreditCard = B.CreditCard;

And only give access to that view. I know this doesn't give the same flexibility but you do not need to do the whole hash thing.

If more flexibility is desired then your proposed approach of adding a masked column to both tables would be the way to go. I would think something like sha2 or the aes_encrypt function should provide a way to be very secure in avoiding hash collisions.

avatar
Expert Contributor

@Benjamin Leonhardi Thanks, makes sense