Created 10-24-2016 10:11 AM
Table created with this :
create table syslog_staged (id string, facility string, sender string, severity string, tstamp string, service string, msg string) partitioned by (hostname string, year string, month string, day string) clustered by (id) into 20 buckets stored as orc tblproperties("transactional"="true");
the table is populated with Apache nifi's PutHiveStreaming...
alter table syslog_staged partition (hostname="cloudserver19", year="2016", month="10", day="24") compact 'major';
Now it turns out compaction fails for some reason.....(from job history)
No of maps and reduces are 0 job_1476884195505_0031 Job commit failed: java.io.FileNotFoundException: File hdfs://hadoop1.openstacksetup.com:8020/apps/hive/warehouse/log.db/syslog_staged/hostname=cloudserver19/year=2016/month=10/day=24/_tmp_27c40005-658e-48c1-90f7-2acaa124e2fa does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:966) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:962) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:962) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:776) at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
from hive metastore log :
2016-10-24 16:33:35,503 WARN [Thread-14]: compactor.Initiator (Initiator.java:run(132)) - Will not initiate compaction for log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24 since last hive.compactor.initiator.failed.compacts.threshold attempts to compact it failed.
Created 11-04-2016 08:38 PM
I had the same issue. The root cause of my issue was the hive user account (that the compactor was running under) did not have write permissions to the directory that the hive database was in. After adjusting the permissions I was able to run the compactor succesfully.
Created 02-16-2017 06:38 PM
@Benjamin Hopp, did your case involve Streaming Ingest?
Created 01-15-2017 09:04 PM
I am also having same issue I am getting error like
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/abc_db.db/abc_aca_present/delta_0001296_0001296/bucket_00000_flush_length at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:693) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
Created 02-16-2017 04:43 PM
This looks like https://issues.apache.org/jira/browse/HIVE-15309 in which case it can be ignored
Created 02-16-2017 10:56 AM
did you manage to fix the compaction problem? How?
I have the same problem on a new partition after upgrading to HDP 2.5.3 from 2.3.6 and it's not a permission problem as in @Benjamin Hopp case
Created 02-16-2017 04:35 PM
this seems related to
https://issues.apache.org/jira/browse/HIVE-15142
do you have any idea about our problem?
Created 02-16-2017 04:55 PM
@Davide Ferrari could you post "dfs -lsr" of your table dir please
Are you able to see the _tmp... file?
Created 02-16-2017 05:02 PM
No, @Eugene Koifman Just the "_orc_acid_version" and all the delta subdirs with the 8 bucket files + 8 _flush_length. _tmp file never gets createdand actually the compaction job fails in a matter of seconds
Created 02-16-2017 05:03 PM
This is a Hive Streaming installation updated from HDP 2.3.6 to HDP 2.5.3 directly. This table partition was created on 2.5.3