Created on 07-22-2014 08:10 AM - edited 09-16-2022 02:03 AM
When we try to run more complex Impala queries, we often run into the following error:
Couldn't open transport for worker29.ourdomain.com:22000(connect() failed: Connection timed out)
Sometimes there's only one node with that error message, sometimes there are 2-5.
There doesn't seem to be a network related problem - ping works, telnet to that port works, Impala debug ui works.
We tried setting vm.swappiness on the nodes from 60 to 0 - no positive effect. Same with switching vm.overcommit from 0 to 1.
Our setup:
- around 40 nodes, i7 quad core, 2-3TB, 1Gbit NIC, located in 5 different racks
- nodes have around 16-48GB ram, same amount of swap, which they alsmost never use
- OS: Ubuntu Linux 12.04
- CDH 5.1.0
- impalad version 1.4.0-cdh5-INTERNAL RELEASE (build e801bd8c0d134e783c2313c7dd422a5ad06591af)
- ~100TB HDFS storage
- we are using a HA proxy which points to the nodes with >32GB ram
- "workerlogs"-table is around 6-7TB big, partitioned by year > month > day and contains apache log-data
- almost 100% short circuit reads
Maybe you could give us a hint.