Support Questions

Find answers, ask questions, and share your expertise

Trouble Connecting to Isilon S3 bucket w Impala and HMS

avatar
New Contributor

Background: I'm a new admin to our existing CDP environment. Current prod set up uses Nifi, Ranger, Impala, HMS, Hive on Tez, Isilon/HDFS on Dell Powerscale.  New directive is to create databases and Iceberg tables via S3 using buckets on the same Powerscale in a different access zone and redirect Nifi processes writing to hdfs to write to S3 buckets.

Steps: I update the core-site.xml and hive-site.xml files with the following properties - "fs.s3a.secret.key", "fs.s3a.access.key" and "fs.s3a.endpoint".

Success: I validated connection to the S3 bucket with aws cli and can run Nifi PutS3Object and the file is in the expected S3 bucket directory.

Challenge: I'm getting a error in Impala when I attempt to create an Iceberg table in the same bucket to use with the Nifi PutIceberg processor.

"ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: Got exception: org.apache.hadoop.fs.s3a.AWSClientIOException getFileStatus on s3a://ce-dev-bucket-3/s3_test_iceberg: com.amazonaws.SdkClientException: Unable to execute HTTP request: ce-dev-bucket-3.<s3endpoint>.domain.org: Name or service not known: Unable to execute HTTP request: ce-dev-bucket-3.<s3endpoint>.domain.org: Name or service not known"
 

Can anyone suggest any permissions or properties that may be missing?



1 ACCEPTED SOLUTION

avatar
Expert Contributor

It might help if you add this property where you set S3 configurations:

fs.s3a.path.style.access=true

https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html

 

In this configuration, requests will be sent to s3domain/bucket (<s3endpoint>.domain.org/ce-dev-bucket-3) instead of bucket.s3domain (ce-dev-bucket-3.<s3endpoint>.domain.org). I checked your same error internally and saw some other customers with similar error that this helped resolve.

View solution in original post

5 REPLIES 5

avatar
Community Manager

@cvonschrott Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Impala experts @Boris G @Chella @ezerihun  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

It might help if you add this property where you set S3 configurations:

fs.s3a.path.style.access=true

https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html

 

In this configuration, requests will be sent to s3domain/bucket (<s3endpoint>.domain.org/ce-dev-bucket-3) instead of bucket.s3domain (ce-dev-bucket-3.<s3endpoint>.domain.org). I checked your same error internally and saw some other customers with similar error that this helped resolve.

avatar
New Contributor

@ezerihun 
I updated this property in the configs in Cloudera Manager for Hive and the Isilon gateway and also updated the query to point to the bucket instead of the endpoint/bucketname as you suggested. I was able to get a create table statement to run successfully (table_type=ICEBERG and stored in the S3 bucket). 
The problem now is that I cannot insert anything into the table because a specific tez related tar.gz file doesn't exist where it thinks it should. Because we have multiple environments and namespaces on the same Isilon, I've engaged our internal storage team for assistance.  However, as far as creating an Iceberg table in an S3 bucket, I think we're good to go.
Thank you so much for your suggestion!
 
*I'm hesitant to close this post until we can successfully read from and write to these Iceberg tables in/on S3, but we're further along now then we ever have been. 

avatar
Expert Contributor

For this new point, I'm not too sure. This may require some new troubleshooting path once your team isolates details about missing Hive tez tar.gz file. But that is good news that the original "service not known" error is resolved and create table works. Feel free to keep this open or create a new thread with further details and our team can continue to review.

avatar
Community Manager

@cvonschrott Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @ezerihun has requested? Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: