hdp-3.0.1 --> hive doesn't honor an s3a endpoints if its not AWS.

Using a non AWS endpoints for S3a and thereby have a basic issue that hive is not honoring the s3a endpoint if its not AWS. While distcp, hadoop fs, Spark, MapReduce jobs are finding my s3a endpoint and got completed/ successful without any issues but HIVE is ignoring it and is expecting for AWS S3 credentials, as seen in the below example.

I tried three options and error was same with all the 3 options as shown below: ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint INFO : Completed executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 116.608 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)
Option 1: Ran the create database command as shown above but passing my s3 credentials using JCEKS in the HDFS core-site.xml as

Running a hive query

0: jdbc:hive2://> CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1'; 

INFO : Compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1' 

INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) 

INFO : Completed compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 230.907 seconds 

INFO : Executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1' INFO : Starting task [Stage-0:DDL] in serial mode 

Option 2: Passing User:s3-Key in URL while creating a databaseI am even tried the option of using CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3-user:s3-secret-key@s3aTestBucket/user/table1'; but didn't work
Option 3: Even added the below propert on hive-site\.mapred\.supports\.subdirectories|fs\.s3a\.access\.key|fs\.s3a\.secret\.key

On hive shell from Ambari ran the following
set fs.s3a.access.key= s3-access-key; set fs.s3a.secret.key= s3-secret-key; CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1';

I saw a similar post from past but not sure if the issue is solved or not




Above issue is observed cause of Refer bug for more details.

As a workaround you can try below methods :

Method 1 : Set below config in core-site.xml

fs.s3a.bucket.<bucket_name>.security.credential.provider.path = <jceks_file_path>

#Replace <bucket_name> and <jceks_file_path> accordingly.

Method 2 : Set below configs in core-site.xml

fs.s3a.bucket.<bucket_name>.access.key = <s3a access key>
fs.s3a.bucket.<bucket_name>.secret.key = <s3a secret key>

#Replace <bucket_name> accordingly.

Let us know if the resolution works.

@Soumitra Sulav I tried Method1: i.e added
restarted HDFS on Ambari but seems like it didn't work. Any suggestion? Please find the logs below.

Didn't try Method2 as it can expose my credentials on Ambari UI

0: jdbc:hive2://> CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3';

INFO : Compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3'

INFO : Semantic Analysis Completed (retrial = false)

INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)

INFO : Completed compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 230.585 seconds

INFO : Executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3'

INFO : Starting task [Stage-0:DDL] in serial mode ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint INFO : Completed executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 115.487 seconds

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)


I believe you are following proper commands to create the jceks file :

hadoop credential create fs.s3a.access.key -value <ACCESS_KEY> -provider jceks://hdfs@<namenode>/tmp/s3a.jceks
hadoop credential create fs.s3a.secret.key -value <SECRET_KEY> -provider jceks://hdfs@<namenode>/tmp/s3a.jceks
#Verify by running below command
hadoop credential list -provider jceks://hdfs@<namenode>/tmp/s3a.jceks

Make sure the hive user can access the jceks file. [Check permissions and owners]

And then you are adding the mentioned configuration in Ambari UI > HDFS > Configs > Custom core-site

I was able to run hive jobs with same scenarios as yours [underlying storage was not AWS]

If still it doesn't work can you try once the Method 2. Just to make sure there isn't any other issue.


@Soumitra Sulav Tried both Methods #1 and #2 today and have attached the logs below. Logs are same for both the methods. Now it seems like its not complaining about AWS but still its failing.

I verified the JCEKS operation by simply running a -ls command with the user and even what you have suggested above comment and it worked. Just to add the cluster is kerberized.


0: jdbc:hive2://> CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1';
INFO  : Compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.055 seconds
INFO  : Executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException)
INFO  : Completed executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.318 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException) (state=08S01,code=1)


Can you please provide the complete stacktrace.

If you aren't sure where to find the logs refer link

The exception encountered by you has been reported in a secure cluster as yours.

Refer the solution provided -

@Soumitra Sulav

While getting logs from YARN resource manager on Web UI at 8088 port in a kerberized cluster, its failing with the authentication error (HTTP Error Code 401, Unauthorized access). I am using chrome and not sure how do I make my web UI to validate the kerberos ticket. Any suggestions.


@Sahil Kaw You can follow these steps to get logs from UI :

OR easier way is to get it from the node/servers.

Just goto /var/log/hadoop-yarn/yarn/yarn*resourcemanager*log

Above file will be log rotated. You can find the relevant file which contains the error stack trace.


@Soumitra Sulav Today I redeployed my HDP cluster and it seems to be working with both the methods that you have shared above. I am not sure why it wasn't working with previous set up and seems like as an intermittent issue. I would keep you posted incase I find it again. Thanks for all your help in this.


