- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Google Storage and Kerberos integration
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Kerberos
-
Security
Created on ‎04-21-2016 12:05 PM - edited ‎09-16-2022 03:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am able to access gs without having kerberos ticket. I am guessing that it's normal but it would be nice to have a way to enforce kerberos auth for gs while accessing the GS from Hadoop.
bash-4.1$ id uid=1023418093(hive) gid=1614812195(hadoop) ----------------------------------------------------------- bash-4.1$ kdestroy kdestroy: No credentials cache found while destroying cache ----------------------------------------------------------- bash-4.1$ klist klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1023418093) ----------------------------------------------------------- bash-4.1$ hadoop fs -ls gs://dev/ 16/04/20 14:31:48 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop2 Found 1 items drwxrwxr-x - hive hive 0 2016-04-11 00:26 gs://dev/apps ----------------------------------------------------------- bash-4.1$ hadoop fs -ls / 16/04/20 14:30:56 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
Created ‎04-21-2016 12:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos.
So the output you showed is what I would expect.
But some thoughts:
- In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory.
- With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway.
- So lock down all access to the environment & network so only authorized users can even run the commands.
- It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop.
- Personally I see this as a bug report, but technically it's a feature request.
- You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately.
- Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately.
- Personally I see this as a bug report, but technically it's a feature request.
- I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security.
- s3a://, swift://, and wasb:// support this method.
Created ‎04-21-2016 12:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos.
So the output you showed is what I would expect.
But some thoughts:
- In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory.
- With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway.
- So lock down all access to the environment & network so only authorized users can even run the commands.
- It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop.
- Personally I see this as a bug report, but technically it's a feature request.
- You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately.
- Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately.
- Personally I see this as a bug report, but technically it's a feature request.
- I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security.
- s3a://, swift://, and wasb:// support this method.
Created ‎04-21-2016 01:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I concur with Sean. As long as any user, who have access to the cluster and the google personal key, they can explore GHFS bucket. I would say, google has to enhance the connector, by allowing intervention of kerberos prior to validation of the personal key.
