Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What's the fastest way to know if an Hive Table contains data?

avatar

Wanted to know what's the fastest/most efficient way to know if there exists data in a partitioned table. select * from database.table limit 1 according to the query plan does a full table scan. Is there any way to avoid this and quickly know if there exists any data in a given table/partition of a table?

3 REPLIES 3

avatar

1. select * from database.table limit 1 It will never perform a full table scan. You can verify this on Resource Manager with no new job being spawned.


2. You can find the HDFS path corresponding to the table and do a hdfs -du on the path to know what is the size of directory.

avatar

Thanks for the quick response @kgautam

1. Due to a bug we have had to disable the default fetch operation by default and hence a TEZ session is spawned for the operation at the moment for every select * limit 1 operation, also since we use LLAP we have observed there is considerable IO within the mappers from the metrics printed after the query execution.

2. Is this a good approach?

avatar

2 option is not the perfect approach but is full proof, as its never possible for HDFS directory to be empty while Hive has data.