By using coalesce/partition you will be re-distributing the data in the partitions. Whereas if it is stored in hive as different partitions then they are few information available in Hcatalog/Hive metastore which enables it get the count much faster than spark. If you want to find the row count of each partition and assuming the table stats are enable/collected then it will again perform better than spark. Whereas in spark there will not separate metadata handled for spark. That's the reason for performance difference. Hope it helps!!