11-29-2016 07:25 AM
Best Practices for Using Impala with S3 states "Set the safety valve fs.s3a.connection.maximum to 1500 for impalad."
Can annyone clarify which safety valve field should be used and with what syntax? I'm reading somewhere that this setting belongs to core-site.xml but Impala configuration in Cloudera Manger does not seem to have a safety valve for core-site.xml. The instructions mentions safety valve for impalad but that safety valve seems to be for command line arguments to impalad.
The problem we are trying to adress is
hdfsSeek(desiredPos=503890631): FSDataInputStream#seek error:
com.cloudera.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
that we keep getting when using Impala for querying data stored in S3.
We are using CDH 5.8.3
11-30-2016 08:32 AM
You should be able to find the safety valve in the Cloudera Manager under the HDFS service. The S3AConnector used by Impala is managed by the HDFS service. It will be under the title: "Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml".
Let me know if you have any other issues.