Support Questions

Find answers, ask questions, and share your expertise

Queries on hive partitioned tables taking more time than on non partitioned tables

I have a table partitioned by country. I have a table with the same dataset which is non-partitioned. On querying from both the tables , I find that the time taken to query on the non-partitioned table takes lesser time than querying on the partitioned table. Can anyone let me know where the issue might be and also possibly write down the solution to it ? The partitioned table is dynamically partitioned.

3 REPLIES 3

Cloudera Employee

Did you check if the partition table is creating small files on HDFS. Ideally you would want the files to be close to the HDFS block size. Also, check if the table stats are computed and you are using the right filter key to read only the required partitions.

any property to set the partition/file size close to the HDFS block size?

Cloudera Employee

you can try

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Par...