Support Questions

Find answers, ask questions, and share your expertise

HBase - java.io.FileNotFoundException: File does not exist: hdfs://ha:8020/apps/hbase/data/archive when scan mob data

avatar
Explorer

I am using HBase 2.1.6.
Exception happened As following : 
hbase(main):001:0> scan 'FPCA_ITEMS_TY_NEW',{STARTROW => '08', LIMIT => 2}
ROW COLUMN+CELL
org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: java.io.FileNotFoundException: File does not exist: hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4
at org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:440)
at org.apache.hadoop.hbase.regionserver.HMobStore.resolve(HMobStore.java:354)
at org.apache.hadoop.hbase.regionserver.MobStoreScanner.next(MobStoreScanner.java:73)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6581)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6745)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6518)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3155)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3404)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1581)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at org.apache.hadoop.hbase.regionserver.StoreFileInfo.<init>(StoreFileInfo.java:139)
at org.apache.hadoop.hbase.regionserver.HStoreFile.<init>(HStoreFile.java:214)
at org.apache.hadoop.hbase.mob.CachedMobFile.create(CachedMobFile.java:49)
at org.apache.hadoop.hbase.mob.MobFileCache.openFile(MobFileCache.java:220)
at org.apache.hadoop.hbase.regionserver.HMobStore.readCell(HMobStore.java:401)
... 13 more

I've checked HDFS file path, and the file is not exist truly.

How can I resolve this problem? How HBase know what to find file path?

Thanks in advance.

6 REPLIES 6

avatar
Master Mentor

@allen_chu 
As you correctly suspected the error indicates that HBase is trying to access a MOB (Medium Object) file in HDFS that no longer exists at the expected location

Spoiler
hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4

Potential Root Causes:

  • The MOB file was manually deleted from HDFS
  • HBase's MOB cleanup process didn't properly update metadata
  • The HDFS path was changed without proper HBase configuration updates
  • Corrupted HBase metadata
  • Incomplete data migration
  • Filesystem inconsistency
  • Incomplete compaction or archiving process

1. Verify HDFS Integrity

Spoiler
# Check HDFS file system health
hdfs fsck /apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW

2. HBase Data Consistency Check

 

Spoiler

# Start HBase shell
hbase shell

# Major compact the table to rebuild metadata
major_compact 'FPCA_ITEMS_TY_NEW'

# Verify table status
status 'detailed'

3. Immediate Recovery Options
Option A: Repair the table

Spoiler
hbase hbck -j <path_to_hbase_classpath> -repair FPCA_ITEMS_TY_NEW

4. Advanced Recovery Options

 

Spoiler

# If previous methods fail, consider:
# a) Snapshot and restore
hbase shell
snapshot 'FPCA_ITEMS_TY_NEW', 'FPCA_ITEMS_TY_NEW_SNAPSHOT'

# b) Clone the snapshot
clone_snapshot 'FPCA_ITEMS_TY_NEW_SNAPSHOT', 'FPCA_ITEMS_TY_NEW_RECOVERED'

 

Option B: Disable and re-enable the table

Spoiler
disable 'FPCA_ITEMS_TY_NEW'
enable 'FPCA_ITEMS_TY_NEW'

Option C: Run MOB compaction

Spoiler
hbase org.apache.hadoop.hbase.mob.mapreduce.Sweeper FPCA_ITEMS_TY_NEW cf

Replace the 'cf' with your actual column family name
Option 😧 If data is not critical, you can reset MOB references

Spoiler
alter 'FPCA_ITEMS_TY_NEW', {NAME => 'cf', MOB_COMPACT_PARTITION_POLICY => 'daily'}
major_compact 'FPCA_ITEMS_TY_NEW'

3. Preventive Measures

Spoiler
<property>
<name>hbase.mob.file.expired.period</name>
<value>86400</value> <!-- 1 day in seconds -->
</property>

Additional Checks

1.Verify HDFS permissions for the HBase user
2.Check HDFS health (namenode logs, datanode availability)
3.Review HBase MOB configuration for the table

Spoiler
describe 'FPCA_ITEMS_TY_NEW'

With the above steps you should be able to resolve your hbase issue

Happy hadooping





 

avatar
Explorer

Thanks for your detailed answer.

I'll try these options you provided in a Test Env.

For now, because I am sure that the MOB HFile is missing, I put an empty HFile in that path instead. And the Exception is not showing again when I scan data.

By the way, I am curious about why set hbase.mob.file.expired.period can be a Preventive Measures?

avatar
Expert Contributor

Hi @allen_chu @Shelton 

There is no configuration named hbase.mob.file.expired.period in any of the hbase versions. @Shelton Could you please give reference for the property you have shared?

avatar
Master Mentor

@shubham_sharma 

The hbase.mob.file.expired property in HBase is found in the hbase-site.xml configuration file.

This property is related to HBase's MOB (Medium-sized Objects) feature, which is designed to efficiently store objects that are larger than typical HBase cells but not large enough to warrant HDFS storage.

Spoiler
<property>
  <name>hbase.mob.file.expired</name>
  <value>30</value>
  <description>The number of days to keep a mob file before deleting it. Default value is 30 days.</description>
</property>


Happy hadooping 

avatar
Expert Contributor

@Shelton 

Have you tested this property in your environment? Neither hbase.mob.file.expired nor hbase.mob.file.expired.period properties are present in hbase.

A MOB HFile will be subject to archiving under any of the following conditions:

  • Any MOB HFile older than the column family’s TTL

  • Any MOB HFile older than a "too recent" threshold with no references to it from the regular hfiles for all regions in a column family

MOB files are cleaned when MOB cleaner chore runs. This runs every 24 hours. Cleaner chore identifies the files whose TTL's are expired and removes them. TTL needs to be set on a table schema level. Default TTL is forever. 

 

hbase.master.mob.cleaner.period

Description

The period that MobFileCleanerChore runs. The unit is second. The default value is one day. The MOB file name uses only the date part of the file creation time in it. We use this time for deciding TTL expiry of the files. So the removal of TTL expired files might be delayed. The max delay might be 24 hrs.

Default

86400

Always verify the answers from AI generated sources before posting it to community. 

Reference -

https://hbase.apache.org/book.html#_mob_architecture

cc: @cjervis 

avatar
Master Mentor

@shubham_sharma 
my bad its  hbase.mob.file.expired.period and not hbase.mob.file.expired 
Happy hadooping