Created 08-03-2018 05:30 PM
Wanted to know what's the fastest/most efficient way to know if there exists data in a partitioned table. select * from database.table limit 1 according to the query plan does a full table scan. Is there any way to avoid this and quickly know if there exists any data in a given table/partition of a table?
Created 08-03-2018 06:09 PM
1. select * from database.table limit 1 It will never perform a full table scan. You can verify this on Resource Manager with no new job being spawned.
2. You can find the HDFS path corresponding to the table and do a hdfs -du on the path to know what is the size of directory.
Created 08-03-2018 06:15 PM
Thanks for the quick response @kgautam
1. Due to a bug we have had to disable the default fetch operation by default and hence a TEZ session is spawned for the operation at the moment for every select * limit 1 operation, also since we use LLAP we have observed there is considerable IO within the mappers from the metrics printed after the query execution.
2. Is this a good approach?
Created 08-03-2018 06:31 PM
2 option is not the perfect approach but is full proof, as its never possible for HDFS directory to be empty while Hive has data.