Created on 09-29-2017 09:53 PM - edited 09-16-2022 05:19 AM
How you can increase the speed of data processing?
I have script
select pgw_in.numrec,pgw_in.datavolumefbcdownlink,pgw_in.datavolumefbcuplink,pgw_in.duration,pgw_in.pgwaddress,pgw_in.servedimeisv,pgw_in.servedimsi,pgw_in.servedmsisdn,pgw_in.timeoffirstusage,pgw_in.timeoflastusage,pgw_in.userlocationinformation,pgw_oracle.idfrom pgw.pgw_in,pgw.pgw_oracle where (cast(unix_timestamp(substr(pgw_in.timeoffirstusage,2,9),"yyyy-M-dd") as timestamp)>=pgw_oracle.firstdatetime and cast(unix_timestamp(substr(pgw_in.timeoffirstusage,2,9),"yyyy-M-dd") as timestamp)<=pgw_oracle.seconddatetime and ( ( pgw_in.servedimsi=pgw_oracle.servedimsi and pgw_in.servedimeisv=pgw_oracle.servedimeisv and pgw_in.servedmsisdn=pgw_oracle.servedmsisdn) or ( pgw_oracle.servedimeisv="null" and pgw_in.servedimsi=pgw_oracle.servedimsi and pgw_oracle.servedmsisdn=pgw_in.servedmsisdn) or ( pgw_oracle.servedimeisv=pgw_in.servedimeisv and pgw_oracle.servedimsi="null" and pgw_oracle.servedmsisdn=pgw_in.servedmsisdn) or ( pgw_oracle.servedimsi="null" and pgw_oracle.servedimeisv="null" and pgw_oracle.servedmsisdn=pgw_in.servedmsisdn)))
Tables - pgw_in,pgw_oracle.
The script runs slowly, about 15 minutes to process data.
Created 10-01-2017 06:52 PM
Created 10-05-2017 05:37 AM
Explain String
1 | Estimated Per-Host Requirements: Memory=1.88GB VCores=2 |
2 | WARNING: The following tables are missing relevant table and/or column statistics. |
3 | pgw.pgw_in, pgw.pgw_oracle |
4 | |
5 | PLAN-ROOT SINK |
6 | | |
7 | 04:EXCHANGE [UNPARTITIONED] |
8 | | |
9 | 02:NESTED LOOP JOIN [INNER JOIN, BROADCAST] |
10 | | predicates: ((CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 9), 'yyyy-M-dd') AS TIMESTAMP) >= pgw_oracle.firstdatetime AND CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 9), 'yyyy-M-dd') AS TIMESTAMP) <= pgw_oracle.seconddatetime) OR (CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 10), 'yyyy-M-dd') AS TIMESTAMP) >= pgw_oracle.firstdatetime AND CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 10), 'yyyy-M-dd') AS TIMESTAMP) <= pgw_oracle.seconddatetime)), (pgw_in.servedimsi = pgw_oracle.servedimsi AND ((pgw_in.servedimeisv = pgw_oracle.servedimeisv AND pgw_in.servedmsisdn = pgw_oracle.servedmsisdn) OR (pgw_oracle.servedimeisv = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn)) OR (pgw_oracle.servedimeisv = pgw_in.servedimeisv AND pgw_oracle.servedimsi = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn) OR (pgw_oracle.servedimsi = 'null' AND pgw_oracle.servedimeisv = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn)) |
11 | | |
12 | |--03:EXCHANGE [BROADCAST] |
13 | | | |
14 | | 01:SCAN HDFS [pgw.pgw_oracle] |
15 | | partitions=1/1 files=0 size=0B |
16 | | |
17 | 00:SCAN HDFS [pgw.pgw_in] |
18 | partitions=1/1 files=677 size=648.06GB |
Created 10-05-2017 07:28 AM
Created 07-18-2018 06:10 AM