I guess you like to reduce execution time for your job? Can you provide some more details on your job? You are using the hbase-connector? What means deduplication, will it happen on all attributes, or just on some of them? Based on my experience it is typically faster to determine duplicates by first calculating a hash and comparing the hash instead of comparing the attributes one by one.
Would it be an option to store the records 'deduplicated' in Hbase and just add columns or versions, or is a change for the Hbase 'feeding' anyway outside your control?