Impala Error: Couldn't open transport

When we try to run more complex Impala queries, we often run into the following error:

Couldn't open transport for failed: Connection timed out)


Sometimes there's only one node with that error message, sometimes there are 2-5.

There doesn't seem to be a network related problem - ping works, telnet to that port works, Impala debug ui works.

We tried setting vm.swappiness on the nodes from 60 to 0 - no positive effect. Same with switching vm.overcommit from 0 to 1.

Our setup:

- around 40 nodes, i7 quad core, 2-3TB, 1Gbit NIC, located in 5 different racks

- nodes have around 16-48GB ram, same amount of swap, which they alsmost never use

- OS: Ubuntu Linux 12.04

- CDH 5.1.0

- impalad version 1.4.0-cdh5-INTERNAL RELEASE (build e801bd8c0d134e783c2313c7dd422a5ad06591af)

- ~100TB HDFS storage

- we are using a HA proxy which points to the nodes with >32GB ram

- "workerlogs"-table is around 6-7TB big, partitioned by year > month > day and contains apache log-data

- almost 100% short circuit reads


Maybe you could give us a hint.

