Created on 05-30-2014 06:48 AM - edited 09-16-2022 01:59 AM
Platform info:
CDH 4.6.0 (without CM).
Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)
Query that hangs:
set PARQUET_COMPRESSION_CODEC=gzip;
INSERT INTO TABLE t2 PARTITION(dt) SELECT * FROM t WHERE dt='2014-05-27-00';
Info about tables:
t - parquet format, without any compression ~9.9GB data.
t2 - schema is copied from t table - parquet format.
Have inserted same data to other table with set PARQUET_COMPRESSION_CODEC=snappy; and it worked well. But gzip comppresion is somehow hanging whole query.
Query profile log hanged on this: http://pastebin.com/MWcpUQiA
impala-server.log has this to say:
FSDataOutputStream#close error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/hive/warehouse/db.db/table/.impala_insert_staging/ad4de1e28a843230_b5aeba0046576e96/.ad4de1e28a843230-b5aeba0046576e97_1592503968_dir/dt=2014-05-27-00/ad4de1e28a843230-b5aeba0046576e97_1536053034_data.2: File does not exist. Holder DFSClient_NONMAPREDUCE_-1006280791_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2360)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2273)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
at org.apache.hadoop.ipc.Client.call(Client.java:1238)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
E0530 15:40:03.365649 19261 impala-beeswax-server.cc:380] unknown query id: ad4de1e28a843230:b5aeba0046576e96
P.S. insert code button opens empty popup.
Created 07-07-2014 11:48 AM
How many resulting partitions are you expecting? The query is generating all the output partitions
on each DN which can result in instability.
Take a look at these docs, in particular the SHUFFLE and NOSHUFFLE sections.
Created 07-07-2014 11:48 AM
How many resulting partitions are you expecting? The query is generating all the output partitions
on each DN which can result in instability.
Take a look at these docs, in particular the SHUFFLE and NOSHUFFLE sections.
Created on 08-25-2014 03:46 AM - edited 08-25-2014 03:58 AM
The thing is, that this query is only selecting one partition. Since parquet table is identical (except file format), we are excepting only one partition to be written in too.
It's a shame that a long time passed since first answer. I am not able to check if SHUFFLE and NOSHUFFLE keywords will help in this situation. But I will accept this answer.