Support Questions

pranavsinghania · ‎08-03-2018

Wanted to know what's the fastest/most efficient way to know if there exists data in a partitioned table. select * from database.table limit 1 according to the query plan does a full table scan. Is there any way to avoid this and quickly know if there exists any data in a given table/partition of a table?

kgautam · ‎08-03-2018

1. select * from database.table limit 1 It will never perform a full table scan. You can verify this on Resource Manager with no new job being spawned.

2. You can find the HDFS path corresponding to the table and do a hdfs -du on the path to know what is the size of directory.

pranavsinghania · ‎08-03-2018

Thanks for the quick response @kgautam

1. Due to a bug we have had to disable the default fetch operation by default and hence a TEZ session is spawned for the operation at the moment for every select * limit 1 operation, also since we use LLAP we have observed there is considerable IO within the mappers from the metrics printed after the query execution.

2. Is this a good approach?

kgautam · ‎08-03-2018

2 option is not the perfect approach but is full proof, as its never possible for HDFS directory to be empty while Hive has data.

Cloudera Community

Support Questions

What's the fastest way to know if an Hive Table contains data?