Member since
10-27-2014
4
Posts
1
Kudos Received
0
Solutions
11-12-2014
08:44 PM
thanks for your reply , and you are right about partition pruning.
... View more
11-11-2014
10:02 PM
How can we do partition pruninng in impala join query , not able to figure out and where clause is also not giving the desired results , need some help on this . here is the query : select a11.apn apn, cast(a12.tariff_plan as int) tariff_plan, a11.subscriber_id subscriber_id, a11.partition_date partition_date, a13.name name, a12.tariff_plan tariff_plan0, a12.charging_node charging_node, concat(substring(cast(a11.partition_date as string),5,2),'/',substring(cast(a11.partition_date as string),7,2),'/', substring(cast(a11.partition_date as string),1,4)) CustCol_23, sum(case when (a11.volume between 10*1024 and 50*1024) then 1 else 0 end) WJXBFS1, sum(case when (a11.volume between 5*1024 and 10*1024) then 1 else 0 end) WJXBFS2, sum(case when a11.volume < 5*1024 then 1 else 0 end) WJXBFS3, sum(case when a11.volume>50*1024 then 1 else 0 end) WJXBFS4 from acc_volume_daily a11 join subscriber_history_daily a12 on (a11.partition_date = a12.partition_date and a11.subscriber_id = a12.subscriber_id and concat(substring(cast(a11.partition_date as string),5,2),'/',substring(cast(a11.partition_date as string),7,2),'/', substring(cast(a11.partition_date as string),1,4)) = concat(substring(cast(a12.partition_date as string),5,2),'/',substring(cast(a12.partition_date as string),7,2),'/', substring(cast(a12.partition_date as string),1,4))) join l_data_plans a13 on (cast(a12.tariff_plan as int) = a13.id) where a11.partition_date = a12.partition_date = 20140411 group by a11.apn, cast(a12.tariff_plan as int), a11.subscriber_id, a11.partition_date, a13.name, a12.tariff_plan, a12.charging_node, concat(substring(cast(a11.partition_date as string),5,2),'/',substring(cast(a11.partition_date as string),7,2),'/', substring(cast(a11.partition_date as string),1,4)); -------------------------------------------------------------------------------------------------------------------------------------------------------- here is the explain query result. Explain String ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- PLAN FRAGMENT 0 PARTITION: UNPARTITIONED 10:EXCHANGE tuple ids: 3 PLAN FRAGMENT 1 PARTITION: HASH_PARTITIONED: a11.apn, CAST(a12.tariff_plan AS INT), a11.subscriber_id, a11.partition_date, a13.name, a12.tariff_plan, a12.charging_node, concat(substring(CAST(a11.partition_date AS STRING), 5, 2), '/', substring(CAST(a11.partition_date AS STRING), 7, 2), '/', substring(CAST(a11.partition_date AS STRING), 1, 4)) STREAM DATA SINK EXCHANGE ID: 10 UNPARTITIONED 9:AGGREGATE | output: SUM(SUM(CASE WHEN (a11.volume BETWEEN 10 * 1024 AND 50 * 1024) THEN 1 ELSE 0 END)), SUM(SUM(CASE WHEN (a11.volume BETWEEN 5 * 1024 AND 10 * 1024) THEN 1 ELSE 0 END)), SUM(SUM(CASE WHEN a11.volume < 5 * 1024 THEN 1 ELSE 0 END)), SUM(SUM(CASE WHEN a11.volume > 50 * 1024 THEN 1 ELSE 0 END)) | group by: a11.apn, CAST(a12.tariff_plan AS INT), a11.subscriber_id, a11.partition_date, a13.name, a12.tariff_plan, a12.charging_node, concat(substring(CAST(a11.partition_date AS STRING), 5, 2), '/', substring(CAST(a11.partition_date AS STRING), 7, 2), '/', substring(CAST(a11.partition_date AS STRING), 1, 4)) | tuple ids: 3 | 8:EXCHANGE tuple ids: 3 PLAN FRAGMENT 2 PARTITION: RANDOM STREAM DATA SINK EXCHANGE ID: 8 HASH_PARTITIONED: a11.apn, CAST(a12.tariff_plan AS INT), a11.subscriber_id, a11.partition_date, a13.name, a12.tariff_plan, a12.charging_node, concat(substring(CAST(a11.partition_date AS STRING), 5, 2), '/', substring(CAST(a11.partition_date AS STRING), 7, 2), '/', substring(CAST(a11.partition_date AS STRING), 1, 4)) 5:AGGREGATE | output: SUM(CASE WHEN (a11.volume BETWEEN 10 * 1024 AND 50 * 1024) THEN 1 ELSE 0 END), SUM(CASE WHEN (a11.volume BETWEEN 5 * 1024 AND 10 * 1024) THEN 1 ELSE 0 END), SUM(CASE WHEN a11.volume < 5 * 1024 THEN 1 ELSE 0 END), SUM(CASE WHEN a11.volume > 50 * 1024 THEN 1 ELSE 0 END) | group by: a11.apn, CAST(a12.tariff_plan AS INT), a11.subscriber_id, a11.partition_date, a13.name, a12.tariff_plan, a12.charging_node, concat(substring(CAST(a11.partition_date AS STRING), 5, 2), '/', substring(CAST(a11.partition_date AS STRING), 7, 2), '/', substring(CAST(a11.partition_date AS STRING), 1, 4)) | tuple ids: 3 | 4:HASH JOIN | join op: INNER JOIN (BROADCAST) | hash predicates: | CAST(a12.tariff_plan AS INT) = a13.id | tuple ids: 0 1 2 | |----7:EXCHANGE | tuple ids: 2 | 2:HASH JOIN | join op: INNER JOIN (BROADCAST) | hash predicates: | a11.partition_date = a12.partition_date | a11.subscriber_id = a12.subscriber_id | other predicates: concat(substring(CAST(a11.partition_date AS STRING), 5, 2), '/', substring(CAST(a11.partition_date AS STRING), 7, 2), '/', substring(CAST(a11.partition_date AS STRING), 1, 4)) = concat(substring(CAST(a12.partition_date AS STRING), 5, 2), '/', substring(CAST(a12.partition_date AS STRING), 7, 2), '/', substring(CAST(a12.partition_date AS STRING), 1, 4)) | tuple ids: 0 1 | |----6:EXCHANGE | tuple ids: 1 | 0:SCAN HDFS table=reporting_mini.acc_volume_daily #partitions=0 size=0B tuple ids: 0 PLAN FRAGMENT 3 PARTITION: RANDOM STREAM DATA SINK EXCHANGE ID: 7 UNPARTITIONED 3:SCAN HDFS table=reporting_mini.l_data_plans #partitions=1 size=21.96KB compact tuple ids: 2 PLAN FRAGMENT 4 PARTITION: RANDOM STREAM DATA SINK EXCHANGE ID: 6 UNPARTITIONED 1:SCAN HDFS table=reporting_mini.subscriber_history_daily #partitions=191 size=2.33GB compact tuple ids: 1
... View more
Labels:
- Labels:
-
Apache Impala
-
HDFS