Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

query speed is every slow in Impala ,I am using CDH5.5.0

query speed is every slow in Impala ,I am using CDH5.5.0

New Contributor

query speed is slow in Impala,I am using CDH5.5.0,SQL statements such as:

select domain, sum(domain_request_count) domain_request_count,sum(domain_response_count) domain_response_count from 
dfdsdb.request_response_domain_sc where cast(CONCAT(year,month,day) as int) 
between cast("20151214" as int) and cast("20151231" as int) group by domain order by domain_request_count desc limit 10

In 30 seconds or so commonly, sometimes takes more than 50 seconds, the fastest time in 15 seconds.
the table dfdsdb.request_response_domain_sc have (date) (month) (year), three partitions.Amount of data at around one hundred million.
By definition, this statement should take under 10 seconds.I monitor the backstage implala log, found time-consuming long query background are abnormal, as follows:

Tuple(id=0 size=40 slots=[Slot(id=0 type=STRING col_path=[4] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT col_path=[5] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=2 type=BIGINT col_path=[6] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1), Slot(id=3 type=STRING col_path=[0] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=4 type=STRING col_path=[1] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=5 type=STRING col_path=[2] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1)] tuple_path=[])
Tuple(id=1 size=40 slots=[Slot(id=6 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=7 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=8 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
Tuple(id=2 size=40 slots=[Slot(id=9 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=10 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=11 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
I0106 09:46:59.656497 19278 plan-fragment-executor.cc:303] Open(): instance_id=794f58dadaa44cb8:1f24c33dda8d00a2
I0106 09:47:20.070286  6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode1:8020. Trying to fail over immediately.
Java exception follows:
org.apache.hadoop.net.ConnectTimeoutException: Call From datanode to namenode1:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending namenode1:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
	at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
	at org.apache.hadoop.ipc.Client.call(Client.java:1476)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLojavascript:;cations(ClientNamenodeProtocolTranslatorPB.java:254)
	at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=namenode2:8020]
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
	at org.apache.hadoop.ipc.Client.call(Client.java:1442)
	... 21 more
I0106 09:47:20.077205  6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode2:8020 after 1 fail over attempts. Trying to fail over after sleeping for 1300ms.
Java exception follows:

Query quickly, without the logs, I suspect that is caused by connection timeout impala query speed is slow, but, how to solve this problem? thanks

2 REPLIES 2

Re: query speed is every slow in Impala ,I am using CDH5.5.0

Master Collaborator

Hi,

  It looks like there is a problem with HDFS or your network. The error message is from HDFS - the HDFS datanode local to the Impalad isn't able to connect to the HDFS namenode.

 

Diagnosing this problem is outside of the scope of this forum, but you probably want to check that the network between your Impala nodes and the namenode is healthy and that your namenode is also healthy.

 

Performance is going to depend on a lot of things, including your hardware, cluster size, number of columns, data distribution, etc.

Re: query speed is every slow in Impala ,I am using CDH5.5.0

New Contributor

Thank you for your reply, first of all, I am sure that is a question of HDFS,But what exactly is caused by the delay or not sure,The communication delay between each datanode and the namenode is very low,Don't like hardware problem

Don't have an account?
Coming from Hortonworks? Activate your account here