Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala query hangs with No lease on error

avatar
Contributor

Platform info:

CDH 4.6.0 (without CM).

Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)

 

Query that hangs:

set PARQUET_COMPRESSION_CODEC=gzip;

INSERT INTO TABLE t2 PARTITION(dt) SELECT * FROM t WHERE dt='2014-05-27-00';

 

Info about tables:

t - parquet format, without any compression ~9.9GB data.

t2 - schema is copied from t table - parquet format.

 

Have inserted same data to other table with set PARQUET_COMPRESSION_CODEC=snappy; and it worked well. But gzip comppresion is somehow hanging whole query.

 

Query profile log hanged on this: http://pastebin.com/MWcpUQiA

 

impala-server.log has this to say:

FSDataOutputStream#close error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/hive/warehouse/db.db/table/.impala_insert_staging/ad4de1e28a843230_b5aeba0046576e96/.ad4de1e28a843230-b5aeba0046576e97_1592503968_dir/dt=2014-05-27-00/ad4de1e28a843230-b5aeba0046576e97_1536053034_data.2: File does not exist. Holder DFSClient_NONMAPREDUCE_-1006280791_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2360)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2273)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

at org.apache.hadoop.ipc.Client.call(Client.java:1238)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
E0530 15:40:03.365649 19261 impala-beeswax-server.cc:380] unknown query id: ad4de1e28a843230:b5aeba0046576e96

 

P.S. insert code button opens empty popup.

1 ACCEPTED SOLUTION

avatar
Contributor

How many resulting partitions are you expecting? The query is generating all the output partitions

on each DN which can result in instability.  

 

Take a look at these docs, in particular the SHUFFLE and NOSHUFFLE sections.

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

View solution in original post

2 REPLIES 2

avatar
Contributor

How many resulting partitions are you expecting? The query is generating all the output partitions

on each DN which can result in instability.  

 

Take a look at these docs, in particular the SHUFFLE and NOSHUFFLE sections.

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

avatar
Contributor

The thing is, that this query is only selecting one partition. Since parquet table is identical (except file format), we are excepting only one partition to be written in too.

It's a shame that a long time passed since first answer. I am not able to check if SHUFFLE and NOSHUFFLE keywords will help in this situation. But I will accept this answer.