Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Issue creating/accessing hive external table with s3 location from spark thrift server

Explorer

I have configured the s3 keys (access key and secret key) in a jceks file using hadoop-credential api. Commands used for the same are as below:

hadoop credential create fs.s3a.access.key -provider jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks

hadoop credential create fs.s3a.secret.key -provider jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks

Then, I am opening a connection to Spark Thrift Server using beeline and passing the jceks file path in the connection string as below:

beeline -u "jdbc:hive2://hostname:10001/;principal=hive/_HOST@?hadoop.security.credential.provider.path=jceks://hdfs@nn_hostname/tmp/s3creds_test.jceks;

Now, when I try to create an external table with the location in s3, it fails with the below exception:

CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3 (col1 String, col2 String) row format delimited fields terminated by ',' LOCATION 's3a://bucket_name/kalmesh/';

Exception: Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://bucket_name/kalmesh: getFileStatus on s3a://bucket_name/kalmesh: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: request_id), S3 Extended Request ID: extended_request_id=) (state=,code=0)

However, this works fine with Hive Thrift Server.

HDP version: HDP 2.5

Spark version: 1.6

2 REPLIES 2

(Just so that answers don't get duplicated, the question is also posted at http://stackoverflow.com/questions/44062349/issue-creating-accessing-hive-external-table-with-s3-loc... )

Super Collaborator

Not entirely sure if this was the issue (you seem to have run into it a few years ago), but it is important to understand the following: Hive data is stored on HDFS However, the security policies for HDFS and Hive may be different. In fact it is recommended that you do NOT give hdfs level permissions to anyone for the warehouse directory, and use ranger to give out permissions on the SQL level only to the databases and tables which live there. As a result you might have been comparing apples to pears (trying to validate access by doing a hdfs read, and then using beeline to do a table read which goes via different security policies).


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl