Support Questions

Find answers, ask questions, and share your expertise

Google Storage and Kerberos integration

avatar
Master Mentor

I am able to access gs without having kerberos ticket. I am guessing that it's normal but it would be nice to have a way to enforce kerberos auth for gs while accessing the GS from Hadoop.

bash-4.1$ id 
uid=1023418093(hive) gid=1614812195(hadoop) 
----------------------------------------------------------- 
bash-4.1$ kdestroy 
kdestroy: No credentials cache found while destroying cache 
----------------------------------------------------------- 
bash-4.1$ klist 
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1023418093) 
----------------------------------------------------------- 
bash-4.1$ hadoop fs -ls gs://dev/ 
16/04/20 14:31:48 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop2 
Found 1 items 
drwxrwxr-x - hive hive 0 2016-04-11 00:26 gs://dev/apps 
----------------------------------------------------------- 
bash-4.1$ hadoop fs -ls / 
16/04/20 14:30:56 WARN ipc.Client: Exception encountered while connecting to the server : 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) 
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558) 
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
1 ACCEPTED SOLUTION

avatar

The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos.

So the output you showed is what I would expect.

But some thoughts:

  1. In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory.
    1. With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway.
    2. So lock down all access to the environment & network so only authorized users can even run the commands.
  2. It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop.
    1. Personally I see this as a bug report, but technically it's a feature request.
      1. You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately.
      2. Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately.
  3. I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security.
    1. s3a://, swift://, and wasb:// support this method.

View solution in original post

2 REPLIES 2

avatar

The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos.

So the output you showed is what I would expect.

But some thoughts:

  1. In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory.
    1. With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway.
    2. So lock down all access to the environment & network so only authorized users can even run the commands.
  2. It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop.
    1. Personally I see this as a bug report, but technically it's a feature request.
      1. You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately.
      2. Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately.
  3. I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security.
    1. s3a://, swift://, and wasb:// support this method.

avatar
Cloudera Employee

I concur with Sean. As long as any user, who have access to the cluster and the google personal key, they can explore GHFS bucket. I would say, google has to enhance the connector, by allowing intervention of kerberos prior to validation of the personal key.