Support Questions

iwano · ‎09-22-2014

Backend 5:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/...........
Error(255): Unknown error 255

impala in wait_for_completion
ImpalaBeeswaxException: <class 'impala.ImpalaBeeswaxException'>:
Query aborted:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/publisher_hourly_report/account_p=570/time_p=201409392200/314e2d8cd796a9d5-b1bffba062057688_1588663599_data.0
Error(255): Unknown error 255

I find this error when I try to read the data while I'm doing an insert overwrite a partition.

I do not know if I do something wrong.

This is the final step in my workflow; with a cron launch a map-reduce that writes the last two hours on a temporary hive table and do an insert overwrite on the table that is partitioned impala for hours.

My clients continually read data impala and do not want to hide data or to show the error

can you suggest a way to avoid this error?

or if is a known bug and a fix is expected in the short

thanks

charles_tay · ‎09-22-2014

Can you run hadoop fsck / > output.log
then cat it

iwano · ‎09-23-2014

Status: HEALTHY
Total size: 134716245684 B (Total open files size: 5733 B)
Total dirs: 127016
Total files: 221680 (Files currently being written: 45)
Total blocks (validated): 219651 (avg. block size 613319 B) (Total open file blocks (not validated): 45)
Minimally replicated blocks: 219651 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 328 (0.1493278 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.1532567
Corrupt blocks: 0
Missing replicas: 468 (0.098852426 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Tue Sep 23 10:09:43 UTC 2014 in 8997 milliseconds

The filesystem under path '/' is HEALTHY

iwano · ‎09-23-2014

the only other message is like this for library:

/user/hdfs/.staging/job_201406040901_0004/libjars/zookeeper-3.4.5-cdh4.5.0.jar: Under replicated BP-733402976-10.209.71.38-1385959023215:blk_-2187087948868677897_14115041. Target Replicas is 10 but found 4 replica(s).

and Target Replicas is 3 but found 2 replica(s) for some file

Manikumar Juttukonda · ‎09-29-2014

you trying to write and read the file at the same time ?

Em Jay

BellRizz · ‎09-19-2017

I find the issue occurs when the table is written and then read with minimal time in between.

A workaround I used was to put a sleep for some seconds.

weichiu · ‎09-19-2017

Hi @BellRizz thanks offering the workaround. What is the version of CDH that you have?

From the description, it sounds like a race condition between reader and writer, and I suspect that is caused by HDFS-11056 (Concurrent append and read operations lead to checksum error) or HDFS-11160 (VolumeScanner reports write-in-progress replicas as corrupt incorrectly). While the summary of HDFS-11160 seems to suggest differently, I have seen customers hitting this issue with concurrent reads and writes.

BellRizz · ‎09-19-2017

I got CDH 5.7.2,

eciro · ‎09-27-2017

Hwat you mean "sleep for seconds"