- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive with Google Cloud Storage
- Labels:
-
Apache Hadoop
Created ‎05-25-2018 01:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have installed a hadoop 2.6.5
version cluster in GCP using VM's instances. Used GCP connector and pointed by hdfs to use gs bucket. Added the below 2 entries in coresite.xml
:
google.cloud.auth.service.account.json.keyfile=<Path-to-the-JSON-file>
fs.gs.working.dir=/
When using hadoop gs -ls / works fine , but when I am creating a hive tables
CREATE EXTERNAL TABLE test1256(name string,id int) LOCATION 'gs://bucket/';
I get the following error:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1)
Apart form changes to coresite.xml are there any changes to be made at hive.xml also?
Created ‎05-29-2018 08:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To you have access to the GCP IAM console? When treating a service account as a resource, you can grant permission to a user to access that service account. You can grant the Owner, Editor, Viewer, or Service Account User role to a user to access the service account.
Created ‎05-26-2018 07:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to copy the connector into the hadoop-client and hive-client location otherwise you will hit an error
cp gcs-connector-latest-hadoop2.jar /usr/hdp/current/hadoop-client/lib/ cp gcs-connector-latest-hadoop2.jar /usr/hdp/current/hive-client/lib
The below command should run successfully
$ hdfs dfs -ls gs://bucket/
This should run fine, but the issue you are having is with permission for hdpuser1 you will need to correct by running
$ hdfs dfs -chown hdpuser1 gs://bucket/
Now your create table should work, while logged in as hdpuser1
CREATE EXTERNAL TABLE test1256(name string,id int) LOCATION 'gs://bucket/';
Please let me know. If you found this answer addressed your question, please take a moment to log in and click the "Accept" link on the answer.
Created ‎05-28-2018 02:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks a lot for the info, but still facing the same issue.
I did create the user in AD and have a valid ticket , hdfs command does work accessing the GCS but cannot create external hive table.
Created ‎05-28-2018 09:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share the latest error?
Created ‎05-29-2018 01:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1)
hdpuser1 is an AD user, using the same user I execute
$ hdfs dfs -ls gs://bucket/
but using beeline when I try to create an external table it failsCreated ‎05-29-2018 06:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is clearly a permission issue "Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------)"
Have you tried using ACL's
gsutil acl ch -u hdpuser1:WRITE gs://bucket/
And retry
Created ‎05-29-2018 08:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did try, but still fails.
CommandException: hdpuser1:WRITE is not a valid ACL changehdpuser1 is not a valid scope type
The GCS bucket has storage admin rights given to service account
hadoop fs -ls gs://bucket/ = works fine
Created ‎05-29-2018 08:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To you have access to the GCP IAM console? When treating a service account as a resource, you can grant permission to a user to access that service account. You can grant the Owner, Editor, Viewer, or Service Account User role to a user to access the service account.
Created ‎05-31-2018 01:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎06-08-2018 02:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sudi ts
Can you share some more information about this deployment.
- Is doAs enabled (hive.server2.enable.doAs)
- What is the authorization mechanism? Is the Ranger Authorizer being used.
If you can pull a stack trace from the HiveServer2 logs, that'll be very useful.
HDP-2.6.5 ships with the Google connector, so there's no need to replace any jars. The GS connectivity is working given that you can create this table if logging in as the hive user, and list files via hadoop fs -ls.
Cloud storage Access Control is generally handled via Cloud Provider constructs - such as IAM roles. Hadoop interaction in terms of file owners and permissions doesn't capture this. The user returned by hadoop fs -ls will typically be the logged in user, and the permissions don't indicate much.
