Created 11-19-2024 08:19 AM
Background: I'm a new admin to our existing CDP environment. Current prod set up uses Nifi, Ranger, Impala, HMS, Hive on Tez, Isilon/HDFS on Dell Powerscale. New directive is to create databases and Iceberg tables via S3 using buckets on the same Powerscale in a different access zone and redirect Nifi processes writing to hdfs to write to S3 buckets.
Steps: I update the core-site.xml and hive-site.xml files with the following properties - "fs.s3a.secret.key", "fs.s3a.access.key" and "fs.s3a.endpoint".
Success: I validated connection to the S3 bucket with aws cli and can run Nifi PutS3Object and the file is in the expected S3 bucket directory.
Challenge: I'm getting a error in Impala when I attempt to create an Iceberg table in the same bucket to use with the Nifi PutIceberg processor.
"ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: Got exception: org.apache.hadoop.fs.s3a.AWSClientIOException getFileStatus on s3a://ce-dev-bucket-3/s3_test_iceberg: com.amazonaws.SdkClientException: Unable to execute HTTP request: ce-dev-bucket-3.<s3endpoint>.domain.org: Name or service not known: Unable to execute HTTP request: ce-dev-bucket-3.<s3endpoint>.domain.org: Name or service not known"
Can anyone suggest any permissions or properties that may be missing?
Created 11-19-2024 09:13 AM
@cvonschrott Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Impala experts @Boris G @Chella @ezerihun who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 11-19-2024 10:45 AM
It might help if you add this property where you set S3 configurations:
fs.s3a.path.style.access=true
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html
In this configuration, requests will be sent to s3domain/bucket (<s3endpoint>.domain.org/ce-dev-bucket-3) instead of bucket.s3domain (ce-dev-bucket-3.<s3endpoint>.domain.org). I checked your same error internally and saw some other customers with similar error that this helped resolve.
Created 11-19-2024 01:09 PM
@ezerihun
I updated this property in the configs in Cloudera Manager for Hive and the Isilon gateway and also updated the query to point to the bucket instead of the endpoint/bucketname as you suggested. I was able to get a create table statement to run successfully (table_type=ICEBERG and stored in the S3 bucket).
The problem now is that I cannot insert anything into the table because a specific tez related tar.gz file doesn't exist where it thinks it should. Because we have multiple environments and namespaces on the same Isilon, I've engaged our internal storage team for assistance. However, as far as creating an Iceberg table in an S3 bucket, I think we're good to go.
Thank you so much for your suggestion!
*I'm hesitant to close this post until we can successfully read from and write to these Iceberg tables in/on S3, but we're further along now then we ever have been.
Created 11-19-2024 01:38 PM
For this new point, I'm not too sure. This may require some new troubleshooting path once your team isolates details about missing Hive tez tar.gz file. But that is good news that the original "service not known" error is resolved and create table works. Feel free to keep this open or create a new thread with further details and our team can continue to review.
Created 11-22-2024 03:22 PM
@cvonschrott Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @ezerihun has requested? Thanks.
Regards,
Diana Torres,