Created 08-29-2017 08:50 AM
I have a table partitioned by country. I have a table with the same dataset which is non-partitioned. On querying from both the tables , I find that the time taken to query on the non-partitioned table takes lesser time than querying on the partitioned table. Can anyone let me know where the issue might be and also possibly write down the solution to it ? The partitioned table is dynamically partitioned.
Created 08-29-2017 09:02 AM
Did you check if the partition table is creating small files on HDFS. Ideally you would want the files to be close to the HDFS block size. Also, check if the table stats are computed and you are using the right filter key to read only the required partitions.
Created 08-29-2017 09:46 AM
any property to set the partition/file size close to the HDFS block size?
Created 08-30-2017 02:16 AM
you can try
ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE; |