About seth

hsahay · ‎05-14-2020

Hi Seth As described in the document you linked, I added the following entries to my core-site.xml <property> <name>fs.s3a.access.key</name> <value>your_access_key</value> </property> <property> <name>fs.s3a.secret.key</name> <value>your_secret_key</value> </property> I then restarted Impala and hive services. But when i issue impala shell command to create a table whose files are stored on S3 i am still getting an error about S3 credentials not being available. This is the command - impala-shell -i serverName -d schemaName -q "CREATE TABLE s3_test_tbl( \ yr_mnth STRING , \ p_id DOUBLE , \ p_full_na STRING ) \ STORED AS PARQUET \ LOCATION 's3a://bucketname/path/'" And this is the error i get - No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint CAUSED BY: InterruptedIOException: doesBucketExist on biapps-snowflake-sbx-ascap: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint CAUSED BY: AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint CAUSED BY: SdkClientException: Unable to load credentials from service endpoint CAUSED BY: SocketTimeoutException: connect timed out

seth · ‎03-13-2020

Be careful with setting ParallelGCThreads to 8 as it will decrease threads if the system has more processing power. => https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html On a machine with N hardware threads where N is greater than 8, the parallel collector uses a fixed fraction of N as the number of garbage collector threads. The fraction is approximately 5/8 for large values of N. At values of N below 8, the number used is N. On selected platforms, the fraction drops to 5/16. The specific number of garbage collector threads can be adjusted with a command-line option (which is described later). On a host with one processor, the parallel collector will likely not perform as well as the serial collector because of the overhead required for parallel execution (for example, synchronization). However, when running applications with medium-sized to large-sized heaps, it generally outperforms the serial collector by a modest amount on machines with two processors, and usually performs significantly better than the serial collector when more than two processors are available.

seth · ‎12-13-2018

Ok, you can mark as resolved if this is no longer an issue. I've responded in the other new threads as well.

wkrafft · ‎11-28-2017

The error originated from Hue, which I failed to mention in my original post. We have tested that patch and it has solved our issue. Thanks!

Online	Offline
Last Visited	‎01-13-2025 10:05 AM

Member Since	‎09-10-2015 05:25 AM
Last Visited	‎01-13-2025 10:05 AM
Posts	58
Kudos received	2

Cloudera Community

Re: S3 connectivity using impala shell

Re: Error moving data into an encryption zone

Re: S3 connectivity using impala shell

Re: All the jobs are failing with exception`org.ap...

Re: Metadata and lineage collection for S3

Re: Error moving data into an encryption zone