Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

hive transactional table compaction fails

New Contributor

Table created with this :

create table syslog_staged (id string, facility string, sender string, severity string, tstamp string, service string, msg string) partitioned by (hostname string,  year string, month string, day string) clustered by (id) into 20 buckets stored as orc tblproperties("transactional"="true");

the table is populated with Apache nifi's PutHiveStreaming...

alter table syslog_staged partition (hostname="cloudserver19", year="2016", month="10", day="24") compact 'major';

Now it turns out compaction fails for some reason.....(from job history)

No of maps and reduces are 0 job_1476884195505_0031
Job commit failed: File hdfs:// does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(
at org.apache.hadoop.mapred.OutputCommitter.commitJob(
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

from hive metastore log :

2016-10-24 16:33:35,503 WARN  [Thread-14]: compactor.Initiator ( - Will not initiate compaction for log.syslog_staged.hostname=cloudserver19/year=2016/month=10/day=24 since last hive.compactor.initiator.failed.compacts.threshold attempts to compact it failed.


As Eugene suggested, could you paste the output of "dfs -lsr" here so that we can see which dirs are owned by whom?

A few other things we need to confirm:

  1. Is streaming being used before and after the upgrade?
  2. When you say compaction fails, what triggered the compaction? Is that triggered by the system automatically, or is it run by some user manually? If it's a manual compaction, then which user issued the command?
  3. You mentioned the problematic table partition was created on 2.5.3. Which user created it? Do you have issue compacting pre-existing tables created on 2.3.6?

Super Collaborator

This _tmp file should be created in the Mapper of the compaction job. Is there anything about it in the job logs?