Backend 5:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/........... Error(255): Unknown error 255
impala in wait_for_completion ImpalaBeeswaxException: <class 'impala.ImpalaBeeswaxException'>:
Query aborted:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/publisher_hourly_report/account_p=570/time_p=201409392200/314e2d8cd796a9d5-b1bffba062057688_1588663599_data.0 Error(255): Unknown error 255
I find this error when I try to read the data while I'm doing an insert overwrite a partition.
I do not know if I do something wrong.
This is the final step in my workflow; with a cron launch a map-reduce that writes the last two hours on a temporary hive table and do an insert overwrite on the table that is partitioned impala for hours.
My clients continually read data impala and do not want to hide data or to show the error
can you suggest a way to avoid this error?
or if is a known bug and a fix is expected in the short
Total size: 134716245684 B (Total open files size: 5733 B)
Total dirs: 127016
Total files: 221680 (Files currently being written: 45)
Total blocks (validated): 219651 (avg. block size 613319 B) (Total open file blocks (not validated): 45)
Minimally replicated blocks: 219651 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 328 (0.1493278 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.1532567
Corrupt blocks: 0
Missing replicas: 468 (0.098852426 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Tue Sep 23 10:09:43 UTC 2014 in 8997 milliseconds
The filesystem under path '/' is HEALTHY
the only other message is like this for library:
/user/hdfs/.staging/job_201406040901_0004/libjars/zookeeper-3.4.5-cdh4.5.0.jar: Under replicated BP-733402976-10.209.71.38-1385959023215:blk_-2187087948868677897_14115041. Target Replicas is 10 but found 4 replica(s).
and Target Replicas is 3 but found 2 replica(s) for some file
I find the issue occurs when the table is written and then read with minimal time in between.
A workaround I used was to put a sleep for some seconds.
Hi @BellRizz thanks offering the workaround. What is the version of CDH that you have?
From the description, it sounds like a race condition between reader and writer, and I suspect that is caused by HDFS-11056 (Concurrent append and read operations lead to checksum error) or HDFS-11160 (VolumeScanner reports write-in-progress replicas as corrupt incorrectly). While the summary of HDFS-11160 seems to suggest differently, I have seen customers hitting this issue with concurrent reads and writes.