Support Questions

Find answers, ask questions, and share your expertise

Hive with Google Cloud Storage

avatar
Contributor

I have installed a hadoop 2.6.5 version cluster in GCP using VM's instances. Used GCP connector and pointed by hdfs to use gs bucket. Added the below 2 entries in coresite.xml:

google.cloud.auth.service.account.json.keyfile=<Path-to-the-JSON-file> 
fs.gs.working.dir=/

When using hadoop gs -ls / works fine , but when I am creating a hive tables

CREATE EXTERNAL TABLE test1256(name string,id  int)   LOCATION   'gs://bucket/';

I get the following error:

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1)

Apart form changes to coresite.xml are there any changes to be made at hive.xml also?

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/authentication...

1 ACCEPTED SOLUTION

avatar
Master Mentor

@sudi ts

To you have access to the GCP IAM console? When treating a service account as a resource, you can grant permission to a user to access that service account. You can grant the Owner, Editor, Viewer, or Service Account User role to a user to access the service account.

View solution in original post

13 REPLIES 13

avatar
Master Mentor

@sudi ts

You need to copy the connector into the hadoop-client and hive-client location otherwise you will hit an error

cp gcs-connector-latest-hadoop2.jar /usr/hdp/current/hadoop-client/lib/ 
cp gcs-connector-latest-hadoop2.jar /usr/hdp/current/hive-client/lib 

The below command should run successfully

$ hdfs dfs -ls gs://bucket/ 

This should run fine, but the issue you are having is with permission for hdpuser1 you will need to correct by running

$ hdfs dfs -chown hdpuser1 gs://bucket/ 

Now your create table should work, while logged in as hdpuser1

CREATE EXTERNAL TABLE test1256(name string,id int) LOCATION 'gs://bucket/'; 

Please let me know. If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.

avatar
Contributor

Hi,

Thanks a lot for the info, but still facing the same issue.

I did create the user in AD and have a valid ticket , hdfs command does work accessing the GCS but cannot create external hive table.

avatar
Master Mentor

@sudi ts

Can you share the latest error?

avatar
Contributor

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1)

hdpuser1 is an AD user, using the same user I execute

$ hdfs dfs -ls gs://bucket/

but using beeline when I try to create an external table it fails

avatar
Master Mentor

@sudi ts

This is clearly a permission issue "Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------)"

Have you tried using ACL's

gsutil acl ch -u hdpuser1:WRITE gs://bucket/

And retry

avatar
Contributor

@Geoffrey Shelton Okot

I did try, but still fails.

CommandException: hdpuser1:WRITE is not a valid ACL changehdpuser1 is not a valid scope type

The GCS bucket has storage admin rights given to service account

hadoop fs -ls gs://bucket/ = works fine

avatar
Master Mentor

@sudi ts

To you have access to the GCP IAM console? When treating a service account as a resource, you can grant permission to a user to access that service account. You can grant the Owner, Editor, Viewer, or Service Account User role to a user to access the service account.

avatar
Contributor

@Geoffrey Shelton Okot

I was able to create hive external table pointing my storage as GCS. But it only works as hive superuser but doesn't work as a normal hive user meaning, hdpuser1 cannot create hive table it fails with above error, but if execute su - hive it works .

I am no sure how to rectify this.

avatar
Cloudera Employee

Hi @sudi ts

Can you share some more information about this deployment.

- Is doAs enabled (hive.server2.enable.doAs)

- What is the authorization mechanism? Is the Ranger Authorizer being used.

If you can pull a stack trace from the HiveServer2 logs, that'll be very useful.

HDP-2.6.5 ships with the Google connector, so there's no need to replace any jars. The GS connectivity is working given that you can create this table if logging in as the hive user, and list files via hadoop fs -ls.

Cloud storage Access Control is generally handled via Cloud Provider constructs - such as IAM roles. Hadoop interaction in terms of file owners and permissions doesn't capture this. The user returned by hadoop fs -ls will typically be the logged in user, and the permissions don't indicate much.