I'm encountering a strange behavior on MapReduce when using HBase as input format. I run my MR tasks on a same table, same dataset, with a same pattern of Fuzzy Row Filter, multiple times. The Input Records counters shown are not consistent, the smallest number can be 40% less than the largest one.
- CDH version 5.9
- the table is split into 18 regions, distributed on 3 region server. The TTL is set to 10 days for the record, though the dataset for MR only includes those inserted in 7days.