Support Questions
Find answers, ask questions, and share your expertise

How to pass hadoop.security.credential.provider.path in hiveserver2 JDBC connection string?

Highlighted

How to pass hadoop.security.credential.provider.path in hiveserver2 JDBC connection string?

New Contributor

Hello,

I am looking to provide the path to a credential provider jceks file in a hiveserver2 JDBC connection string (S3 credentials are in this jceks file). I have setup the environment and am experiencing... rather exotic symptoms..

I have set S3 credentials in a jceks file on hdfs, ala:

$ hadoop credential create fs.s3a.access.key -value 'xyz' -provider jceks://hdfs@hostname/path/to/jceks
$ hadoop credential create fs.s3a.secret.key -value 'abc' -provider jceks://hdfs@hostname/path/to/jceks

I have whitelisted the property hadoop.security.credential.provider.path in such a way that it can be set at connection time in the JDBC connection string between beeline and hiveserver2. In hive-site:

hive.security.authorization.sqlstd.confwhitelist.append = hadoop.security.credential.provider.path

I am trying to pass the path to the credential provide path to hiveserver2 (and map-reduce child JVMs) via the JDBC connection string, ala:

beeline -u "jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks"

Then something quite wierd happens:

  1. I can create an external table on S3 and select from it
  2. If close the JDBC connection, reconnect, then I CANNOT select THEN create another table on S3.
  3. If close the JDBC connection, reconnect, then I CAN create another table on S3 THEN select from both tables.

I have looked at the DEBUG logs of HS2 and the metastore, nothing in the latter (not even an error) and not useful information in the former.

==> Has anyone seen anything like this before?

Note, I cannot use the beeline --hiveConf flag to specify the value of hadoop.security.credential.provider.path as I will need to use this JDBC connection string in a java client which is not beeline.

See details below:

Initial table creation and select > OK

$ beeline -u "jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks"
...

> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
+-----------+--+
No rows selected (0.102 seconds)

> CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3_1 ( field1 string, field2 int ) row format delimited fields terminated by ',' LOCATION 's3a://mybucket/path/to/files';
No rows affected (0.798 seconds)

> show tables;
+---------------------+--+
|      tab_name       |
+---------------------+--+
| test_table_on_s3_1  |
+---------------------+--+
1 row selected (0.034 seconds)

> select * from test_table_on_s3_1;
+----------------------------+----------------------------+--+
| test_table_on_s3_1.field1  | test_table_on_s3_1.field2  |
+----------------------------+----------------------------+--+
| foo                        | 1                          |
| foo                        | 1                          |
| raph                       | 10                         |
| raph_again                 | 11                         |
| foo                        | 12                         |
| foo                        | 1                          |
| bar                        | 2                          |
| baz                        | 3                          |
+----------------------------+----------------------------+--+
8 rows selected (0.813 seconds)

> Closing: 0: jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks

Reconnect and try to select on table previously created THEN create new table > Failed

$ beeline -u "jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks"
...

> show tables;
+---------------------+--+
|      tab_name       |
+---------------------+--+
| test_table_on_s3_1  |
+---------------------+--+
1 row selected (0.111 seconds)

> select * from test_table_on_s3_1;
Error: Error while compiling statement: FAILED: SemanticException Unable to fetch table test_table_on_s3_1. Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 1F21069CD62BBCE9, AWS Error Code: null, AWS Error Message: Forbidden (state=42000,code=40000)

> CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3_2 ( field1 string, field2 int ) row format delimited fields terminated by ',' LOCATION 's3a://mybucket/path/to/files';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: C3FAED01BDF9710A, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: szPrt15cbM4lRBzVIlxgCqGtzM4mmhasjyLxeZGgnKIXXEgPb8jyr5dig1jahJqT3CWqC0ShdHo=) (state=08S01,code=1)

> Closing: 0: jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks

Reconnect and try to create new table THEN select on tables previously created > OK

beeline -u "jdbc:hive2://hs2_hostname:port/default;principal=my/principal@REALM?hadoop.security.credential.provider.path=jceks://hdfs@hostname/path/to/jceks"

> CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3_2 ( field1 string, field2 int ) row format delimited fields terminated by ',' LOCATION 's3a://mybucket/path/to/files';
No rows affected (0.489 seconds)

> show tables;
+---------------------+--+
|      tab_name       |
+---------------------+--+
| test_table_on_s3_1  |
| test_table_on_s3_2  |
+---------------------+--+
2 rows selected (0.062 seconds)

> select * from test_table_on_s3_1;
+----------------------------+----------------------------+--+
| test_table_on_s3_1.field1  | test_table_on_s3_1.field2  |
+----------------------------+----------------------------+--+
| foo                        | 1                          |
| foo                        | 1                          |
| raph                       | 10                         |
| raph_again                 | 11                         |
| foo                        | 12                         |
| foo                        | 1                          |
| bar                        | 2                          |
| baz                        | 3                          |
+----------------------------+----------------------------+--+
8 rows selected (0.871 seconds)

> select * from test_table_on_s3_2;
+----------------------------+----------------------------+--+
| test_table_on_s3_2.field1  | test_table_on_s3_2.field2  |
+----------------------------+----------------------------+--+
| foo                        | 1                          |
| foo                        | 1                          |
| raph                       | 10                         |
| raph_again                 | 11                         |
| foo                        | 12                         |
| foo                        | 1                          |
| bar                        | 2                          |
| baz                        | 3                          |
+----------------------------+----------------------------+--+
8 rows selected (0.893 seconds)

Thanks!

3 REPLIES 3
Highlighted

Re: How to pass hadoop.security.credential.provider.path in hiveserver2 JDBC connection string?

Super Guru
@Raphael Vannson

While I don't know exactly what is your issue, but it is coming from AWS not Hive. You are getting a 403 which is a blanket no from a web resource you are trying to access. See if this link gives you any hint:

https://forums.aws.amazon.com/thread.jspa?messageID=210268

Highlighted

Re: How to pass hadoop.security.credential.provider.path in hiveserver2 JDBC connection string?

New Contributor

Yes, AWS is refusing to give access to hiveserver2. If we make the assumption that AWS is working as expected (which is very likely), then AWS refused to give access to hive because HS2 does not provide the credentials correctly to AWS. The result is that AWS behaves as expected and refuses to give access, hence the 403.

As a result, we can infer the issue is in Hiveserver2 or the metastore (though there are no error in the metastore logs so my initial guess is that the issue is in one of the artifacts used by the HS2 process).

Also, the importance of the sequence to reproduce the issue is quite peculiar (need to do a create table to get HS2 to be able to process other queries such as select).

Thanks!

Raphael

Highlighted

Re: How to pass hadoop.security.credential.provider.path in hiveserver2 JDBC connection string?

@mqureshi

I am also trying to pass the "hadoop.security.credential.provider.path" from beeline, but i am still getting below errors. I also tried passing it as part of jdbc string both throws me same error. I need to anyhow pass the hadoop.security.credential.provider.path using command line. Please help

bash-4.2$ beeline --hive-conf hadoop.security.credential.provider.path=jceks://hdfs@clustername/pedev/user/myuser/myuser.jceks -u "jdbc:hive2://zookeeperlist:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;"

Beeline version 1.2.1000.2.6.3.0-235 by Apache Hive

0: jdbc:hive2://guedlpa15nf08.devfg.rbc.com:2> CREATE EXTERNAL TABLE `HDFSaudit_data8`(`access` string, `action` string, `agenthost` string , `cliip` string , `clitype` string , `enforcer` string , `event_count` bigint , `event_dur_ms` bigint , `evttime` timestamp , `id` string , `logtype` string , `policy` bigint , `reason` string , `repo` string , `repotype` bigint , `reqdata` string , `requser` string , `restype` string , `resource` string , `result` bigint , `seq_num` bigint , `sess` string ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3a://bucketname/hivetest2';

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.InterruptedIOException: doesBucketExist on bucketname: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.AmazonClientException: Unable to load credentials from Amazon EC2 metadata service) (state=08S01,code=1)