Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hdp-3.0.1 --> hive doesn't honor an s3a endpoints if its not AWS.

avatar
Explorer
Using a non AWS endpoints for S3a and thereby have a basic issue that hive is not honoring the s3a endpoint if its not AWS. While distcp, hadoop fs, Spark, MapReduce jobs are finding my s3a endpoint and got completed/ successful without any issues but HIVE is ignoring it and is expecting for AWS S3 credentials, as seen in the below example.

I tried three options and error was same with all the 3 options as shown below: ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint INFO : Completed executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 116.608 seconds Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)
Option 1: Ran the create database command as shown above but passing my s3 credentials using JCEKS in the HDFS core-site.xml as
hadoop.security.credential.provider.path=jceks://hdfs@nile3-vm6.centera.lab.emc.com:8020/user/test/s3a.jceks

Running a hive query

0: jdbc:hive2://nile3-vm7.centera.lab.emc.com> CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1'; 

INFO : Compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1' 

INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) 

INFO : Completed compiling command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6); Time taken: 230.907 seconds 

INFO : Executing command(queryId=hive_20181007232623_f38e7fac-5aed-4d4a-b08a-9cbfc950d7a6): CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1' INFO : Starting task [Stage-0:DDL] in serial mode 

Option 2: Passing User:s3-Key in URL while creating a databaseI am even tried the option of using CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3-user:s3-secret-key@s3aTestBucket/user/table1'; but didn't work
Option 3: Even added the below propert on hive-site
hive.security.authorization.sqlstd.confwhitelist.append=hive\.mapred\.supports\.subdirectories|fs\.s3a\.access\.key|fs\.s3a\.secret\.key

On hive shell from Ambari ran the following
set fs.s3a.access.key= s3-access-key; set fs.s3a.secret.key= s3-secret-key; CREATE DATABASE IF NOT EXISTS table1 LOCATION 's3a://s3aTestBucket/user/table1';

I saw a similar post from past but not sure if the issue is solved or not

Link
https://community.hortonworks.com/questions/71891/hdp-250-hive-doesnt-seem-to-honor-an-s3a-endpoint....

1 ACCEPTED SOLUTION

avatar
Contributor

Above issue is observed cause of https://issues.apache.org/jira/browse/HIVE-20386. Refer bug for more details.

As a workaround you can try below methods :

Method 1 : Set below config in core-site.xml

fs.s3a.bucket.<bucket_name>.security.credential.provider.path = <jceks_file_path>

#Replace <bucket_name> and <jceks_file_path> accordingly.

Method 2 : Set below configs in core-site.xml

fs.s3a.bucket.<bucket_name>.access.key = <s3a access key>
fs.s3a.bucket.<bucket_name>.secret.key = <s3a secret key>

#Replace <bucket_name> accordingly.

Let us know if the resolution works.

View solution in original post

9 REPLIES 9

avatar
Contributor

Above issue is observed cause of https://issues.apache.org/jira/browse/HIVE-20386. Refer bug for more details.

As a workaround you can try below methods :

Method 1 : Set below config in core-site.xml

fs.s3a.bucket.<bucket_name>.security.credential.provider.path = <jceks_file_path>

#Replace <bucket_name> and <jceks_file_path> accordingly.

Method 2 : Set below configs in core-site.xml

fs.s3a.bucket.<bucket_name>.access.key = <s3a access key>
fs.s3a.bucket.<bucket_name>.secret.key = <s3a secret key>

#Replace <bucket_name> accordingly.

Let us know if the resolution works.

avatar
Explorer

@Soumitra Sulav I tried Method1: i.e added
fs.s3a.bucket.s3aTestBucket.security.credential.provider.path=jceks://hdfs@nile3-vm6.centra.lab.test.com:8020/user/test/s3a.jceks
restarted HDFS on Ambari but seems like it didn't work. Any suggestion? Please find the logs below.

Didn't try Method2 as it can expose my credentials on Ambari UI
Logs:

0: jdbc:hive2://nile3-vm7.centra.lab.test.com> CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3';

INFO : Compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3'

INFO : Semantic Analysis Completed (retrial = false)

INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)

INFO : Completed compiling command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 230.585 seconds

INFO : Executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173): CREATE DATABASE IF NOT EXISTS table3 LOCATION 's3a://s3aTestBucket/user/table3'

INFO : Starting task [Stage-0:DDL] in serial mode ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint INFO : Completed executing command(queryId=hive_20181008105923_0324b26a-64b7-4c8f-91e3-635c62442173); Time taken: 115.487 seconds

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.net.SocketTimeoutException: doesBucketExist on s3aTestBucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint (state=08S01,code=1)

avatar
Contributor

I believe you are following proper commands to create the jceks file :

hadoop credential create fs.s3a.access.key -value <ACCESS_KEY> -provider jceks://hdfs@<namenode>/tmp/s3a.jceks
hadoop credential create fs.s3a.secret.key -value <SECRET_KEY> -provider jceks://hdfs@<namenode>/tmp/s3a.jceks
#Verify by running below command
hadoop credential list -provider jceks://hdfs@<namenode>/tmp/s3a.jceks

Make sure the hive user can access the jceks file. [Check permissions and owners]

And then you are adding the mentioned configuration in Ambari UI > HDFS > Configs > Custom core-site

I was able to run hive jobs with same scenarios as yours [underlying storage was not AWS]

If still it doesn't work can you try once the Method 2. Just to make sure there isn't any other issue.

avatar
Explorer

@Soumitra Sulav Tried both Methods #1 and #2 today and have attached the logs below. Logs are same for both the methods. Now it seems like its not complaining about AWS but still its failing.

I verified the JCEKS operation by simply running a -ls command with the user and even what you have suggested above comment and it worked. Just to add the cluster is kerberized.

Logs:

0: jdbc:hive2://nile3-vm7.centera.lab.test.com> CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1';
INFO  : Compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.055 seconds
INFO  : Executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f): CREATE DATABASE IF NOT EXISTS datab LOCATION 's3a://s3aTestBucket/db1'
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException)
INFO  : Completed executing command(queryId=hive_20181009185046_b19ccbf4-1cfd-4148-96cd-e20a6fe45b1f); Time taken: 0.318 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.reflect.UndeclaredThrowableException) (state=08S01,code=1)

avatar
Contributor

Can you please provide the complete stacktrace.

If you aren't sure where to find the logs refer link

The exception encountered by you has been reported in a secure cluster as yours.

Refer the solution provided - https://community.hortonworks.com/content/supportkb/151796/error-orgapachehadoopsecurityauthenticati...

avatar
Explorer
@Soumitra Sulav

While getting logs from YARN resource manager on Web UI at 8088 port in a kerberized cluster, its failing with the authentication error (HTTP Error Code 401, Unauthorized access). I am using chrome and not sure how do I make my web UI to validate the kerberos ticket. Any suggestions.

avatar
Contributor

@Sahil Kaw You can follow these steps to get logs from UI :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/enabling_browser_access...

OR easier way is to get it from the node/servers.

Just goto /var/log/hadoop-yarn/yarn/yarn*resourcemanager*log

Above file will be log rotated. You can find the relevant file which contains the error stack trace.

avatar
Explorer

@Soumitra Sulav Today I redeployed my HDP cluster and it seems to be working with both the methods that you have shared above. I am not sure why it wasn't working with previous set up and seems like as an intermittent issue. I would keep you posted incase I find it again. Thanks for all your help in this.

avatar
Contributor

Good to know. If the answer helped you, please upvote so that it can help others.