Created on 08-12-2019 08:13 AM - edited 08-12-2019 09:00 AM
Hi,
Can the forum please address/answer:
Is it possible to have two Hive services installed in one Hadoop cluster such that one of the Hive service (HS2) is encrypted and the other is not .
Also is it possible to have both services pointing to same HMS?
Regards
Created 08-12-2019 11:45 AM
Yes, it is possible.
Regards,
André
Created 08-12-2019 12:23 PM
Thanks, can you share details of the process?
Some of the questions/problems that i have:-
A new hive instance if pointed to the same backend database(of an existing Hive service), will it overwrite any tables that already exist there? (Since when a new instance is spun it creates tables under selected DB.)
A new instance when spun requires to have default hive location mentioned in hdfs, and can't use existing Hive's default locaiton /user/hive/warehouse and has to be something else.
A new instance of hive in parallel with existing Hive service can't use Sentry, sentry supports use of only single hive service.
Above is what i noticed which defeats the purpose, intention is to use other hive service and cater same customers that the original service support , but with different default hdfs location, absence of Sentry and maybe more scenarios that i didn't already found or come across are blocking me from going ahead with this setup. Do you have a way to bypass all of this such that both hive services are identical (Except encryption status).
Regards
Created 08-12-2019 04:10 PM
Created 08-12-2019 04:37 PM
I'm afraid it doesn't , TLS is a service wide property in Hive, it's not specific/bound to particular roles to perform above.
Created 08-14-2019 03:03 PM
Sorry for the confusion. Let me backtrack a bit.
You can Have multiple Hive services in the same cluster and you can enable TLS for one of them. You're right, though, that they are not allowed to share the same Hive Warehouse directory in HDFS, but they can use the same metastore database. You'd also have to make sure you don't collocate them on the same hosts, or you'd have to be very careful with port numbers, log directories, etc...
This is not a usual configuration we find in the field or have tested extensively. So, it's not something we usually recommend and you may end up running into edge cases we don't know about. Also, when you're securing a cluster you usually don't want to leave unsecured "backdoors" that might compromise your security.
Are there any specific reason you need the insecure HS2 endpoint?
A new hive instance if pointed to the same backend database(of an existing Hive service), will it overwrite any tables that already exist there? (Since when a new instance is spun it creates tables under selected DB.)
It will not overwrite the database. I only creates the schema if it already doesn't exist.
A new instance when spun requires to have default hive location mentioned in hdfs, and can't use existing Hive's default locaiton /user/hive/warehouse and has to be something else.
You're right. Both services can see and access tables created anywhere but the tables created through a specific service will be located within the warehouse dir of that service, *if* you don't specify an explicit location (external table).
A new instance of hive in parallel with existing Hive service can't use Sentry, sentry supports use of only single hive service.
Mmm... this is where things start to get fuzzy. But I believe this should be possible (I haven't tested). The dependency between those services is set at the Hive service side; you configure Hive to point to a Sentry instance. So, in theory, I believe that it should be possible to configure two Hive services to point to the same Sentry service: HiveA -> Sentry <- HiveB.
Again, this is not a well tested path. Tread carefully 😉
(...) maybe more scenarios that i didn't already found or come across are blocking me from going ahead with this setup
This is a real concern, as I mentioned above. This type of setup opens a security hole and is not something we usually recommend or see in the field.
Do you have a way to bypass all of this such that both hive services are identical (Except encryption status).
I'm not recommending this, but you could put a proxy in front of your HS2 service and terminate the SSL connection at the proxy, so that the clients could connect to the proxy without using TLS. Again, this would defeat the purpose of your security implementation.
HTH,
André
Created 08-14-2019 06:12 PM
Thanks for the descriptive response.
-- Are there any specific reason you need the insecure HS2 endpoint?
Reason of doing this is we are moving over to enable TLS on HS2. As soon as we do it our larger user base will immediately get impacted(Since they'll have to change/update beeline connection string for ex), in order to avoid making them go through a hard cut over, we wanted to try a soft cutover. By soft cut over I mean spin another replica of Hive service and then ecrypt current/original hive service and let the user use the new one and slowly they can move to the encrypted one.
But based on below description among other blockers to this approach the one that really stops me from doing is the hdfs location point, I guess its not possible for them to be exact replica? Like Hive1 , Hive2 have exact same database/table data.?
Regards
Created 08-14-2019 06:14 PM
On a side note not related:
Is there no delete post option in this comment section?
Also it allows an empty post 😄
Peace
Created on 08-15-2019 12:52 PM - edited 08-15-2019 12:54 PM
I removed the empty reply for you.
There is an option to edit a reply for a period of time but removal requires a moderator.
🙂