Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What's the fastest way to know if an Hive Table contains data?

Highlighted

What's the fastest way to know if an Hive Table contains data?

New Contributor

Wanted to know what's the fastest/most efficient way to know if there exists data in a partitioned table. select * from database.table limit 1 according to the query plan does a full table scan. Is there any way to avoid this and quickly know if there exists any data in a given table/partition of a table?

3 REPLIES 3

Re: What's the fastest way to know if an Hive Table contains data?

1. select * from database.table limit 1 It will never perform a full table scan. You can verify this on Resource Manager with no new job being spawned.


2. You can find the HDFS path corresponding to the table and do a hdfs -du on the path to know what is the size of directory.

Highlighted

Re: What's the fastest way to know if an Hive Table contains data?

New Contributor

Thanks for the quick response @kgautam

1. Due to a bug we have had to disable the default fetch operation by default and hence a TEZ session is spawned for the operation at the moment for every select * limit 1 operation, also since we use LLAP we have observed there is considerable IO within the mappers from the metrics printed after the query execution.

2. Is this a good approach?

Highlighted

Re: What's the fastest way to know if an Hive Table contains data?

2 option is not the perfect approach but is full proof, as its never possible for HDFS directory to be empty while Hive has data.


Don't have an account?
Coming from Hortonworks? Activate your account here