Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Temporary file during hive query cannot be replicated.

Highlighted

Temporary file during hive query cannot be replicated.

Expert Contributor

I have a Hive MERGE query, reading avro files to write ORC files.

The avro files are input data, and the ORC files will be my main database.

The merge query almost completes, but always end up failing. The relevant log lines (I think) are:

# Just before failing, still good
[Thread-9646]: monitoring.TezJobMonitor$UpdateFunction (TezJobMonitor.java:update(1
37)) - Map 1: 19/19    Map 5: 80/80    Reducer 2: 1009/1009    Reducer 3: 9(+0)/10     Reducer 4: 1(+8,-19)/10

# a few more log.PerfLogger line...
Vertex failed, vertexName=Reducer 4, vertexId=vertex_1502360038800_0027_2_03, diagnostics=[Task failed, taskId=task_1502360038800_0027_2_03_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {}
...
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /dwh/vault/contact/.hive-staging_hive
_2017-09-20_08-56-51_838_2864382824593930489-1/_task_tmp.-ext-10000/name=hsys/id=46/_tmp.000000_0/delta_0000076_0000076_0000/bucket_00000 could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

During this query run I could see that yarn memory was quite high (91%). There was no other things I notices, except this repeated in hadoop-hdfs-namenode.log

WARN  blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(385)) - Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

I could not find myself anything relevant related to this error.

An fsck (after query failure) does not give any error. Ambari does not find any under replicated blocks.

Any idea if there is a usual culprit for this error, or where I could look?

Thanks.

Small (1 ambari, 3 DN) hdp2.6 cluster, on AWS.

2 REPLIES 2
Highlighted

Re: Temporary file during hive query cannot be replicated.

Contributor

@Guillaume Roger As seen from the log you provided, the file blocks were not written to HDFS

Failed to place enough replicas, still in need of 3 to reach 3 - indicates the block couldnot be written to any of the 3 DNs in the cluster.

All required storage types are unavailable

Can you please check if HDFS is in healthy state and whether you are able to write files to HDFS.

Re: Temporary file during hive query cannot be replicated.

Expert Contributor

I agree with your assessment (files cannot be written to HDFS) but my problem is that as far as I know HDFS is in an healthy state: all ambari lights are green, no under replicated blocks, fsck is happy, I can indeed write even huge files on HDFS... If you are aware of other checks I could perform I would love to know about them.

Thanks,

Don't have an account?
Coming from Hortonworks? Activate your account here