Support Questions

Find answers, ask questions, and share your expertise

Query aborted:Failed to open HDFS file

avatar
New Contributor
Backend 5:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/...........
Error(255): Unknown error 255

 

impala in wait_for_completion
ImpalaBeeswaxException: <class 'impala.ImpalaBeeswaxException'>:
Query aborted:Failed to open HDFS file hdfs://nameservice1/user/hive/warehouse/publisher_hourly_report/account_p=570/time_p=201409392200/314e2d8cd796a9d5-b1bffba062057688_1588663599_data.0 Error(255): Unknown error 255

 

I find this error when I try to read the data while I'm doing an insert overwrite a partition.

 

I do not know if I do something wrong.

 

This is the final step in my workflow; with a cron launch a map-reduce that writes the last two hours on a temporary hive table and do an insert overwrite on the table that is partitioned impala for hours.

My clients continually read data impala and do not want to hide data or to show the error

 

can you suggest a way to avoid this error?

or if is a known bug and a fix is expected in the short

 

thanks

 

8 REPLIES 8

avatar
Explorer

Can you run hadoop fsck / > output.log
then cat it

 

avatar
New Contributor

Status: HEALTHY
Total size: 134716245684 B (Total open files size: 5733 B)
Total dirs: 127016
Total files: 221680 (Files currently being written: 45)
Total blocks (validated): 219651 (avg. block size 613319 B) (Total open file blocks (not validated): 45)
Minimally replicated blocks: 219651 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 328 (0.1493278 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.1532567
Corrupt blocks: 0
Missing replicas: 468 (0.098852426 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Tue Sep 23 10:09:43 UTC 2014 in 8997 milliseconds


The filesystem under path '/' is HEALTHY

avatar
New Contributor

the only other message is like this for library:

/user/hdfs/.staging/job_201406040901_0004/libjars/zookeeper-3.4.5-cdh4.5.0.jar: Under replicated BP-733402976-10.209.71.38-1385959023215:blk_-2187087948868677897_14115041. Target Replicas is 10 but found 4 replica(s).

 

and Target Replicas is 3 but found 2 replica(s) for some file 

avatar
Expert Contributor
you trying to write and read the file at the same time ?
Em Jay

avatar
Contributor

I find the issue occurs when the table is written and then read with minimal time in between.

 

A workaround I used was to put a sleep for some seconds.

avatar
Expert Contributor

Hi @BellRizz thanks offering the workaround. What is the version of CDH that you have?

From the description, it sounds like a race condition between reader and writer, and I suspect that is caused by HDFS-11056 (Concurrent append and read operations lead to checksum error) or HDFS-11160 (VolumeScanner reports write-in-progress replicas as corrupt incorrectly). While the summary of HDFS-11160 seems to suggest differently, I have seen customers hitting this issue with concurrent reads and writes.

avatar
Contributor

I got CDH 5.7.2,

avatar
Cloudera Employee

Hwat you mean "sleep for seconds"