Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Setting max S3 connections

avatar
Rising Star

Hi all,

 

Best Practices for Using Impala with S3 states "Set the safety valve fs.s3a.connection.maximum to 1500 for impalad."

 

Can annyone clarify which safety valve field should be used and with what syntax? I'm reading somewhere that this setting belongs to core-site.xml but Impala configuration in Cloudera Manger does not seem to have a safety valve for core-site.xml. The instructions mentions safety valve for impalad but that safety valve seems to be for command line arguments to impalad.

 

The problem we are trying to adress is

 

hdfsSeek(desiredPos=503890631): FSDataInputStream#seek error:
com.cloudera.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

 

that we keep getting when using Impala for querying data stored in S3.

 

We are using CDH 5.8.3

 

Thanks,

Petter

1 ACCEPTED SOLUTION

avatar
Contributor

Hi Pettax,

 

You should be able to find the safety valve in the Cloudera Manager under the HDFS service. The S3AConnector used by Impala is managed by the HDFS service. It will be under the title: "Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml".

 

Let me know if you have any other issues.

 

- Sailesh

View solution in original post

2 REPLIES 2

avatar
Contributor

Hi Pettax,

 

You should be able to find the safety valve in the Cloudera Manager under the HDFS service. The S3AConnector used by Impala is managed by the HDFS service. It will be under the title: "Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml".

 

Let me know if you have any other issues.

 

- Sailesh

avatar
Rising Star

Thank you Sailesh!

 

This solved my problem.

 

Br,

Petter