Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How you can increase the speed of data processing

How you can increase the speed of data processing

New Contributor

How you can increase the speed of data processing?

I have script

select 
pgw_in.numrec,pgw_in.datavolumefbcdownlink,pgw_in.datavolumefbcuplink,pgw_in.duration,pgw_in.pgwaddress,pgw_in.servedimeisv,pgw_in.servedimsi,pgw_in.servedmsisdn,pgw_in.timeoffirstusage,pgw_in.timeoflastusage,pgw_in.userlocationinformation,pgw_oracle.idfrom 
pgw.pgw_in,pgw.pgw_oracle where 
(cast(unix_timestamp(substr(pgw_in.timeoffirstusage,2,9),"yyyy-M-dd") as 
 timestamp)>=pgw_oracle.firstdatetime 
 and cast(unix_timestamp(substr(pgw_in.timeoffirstusage,2,9),"yyyy-M-dd") as 
 timestamp)<=pgw_oracle.seconddatetime 
 and ( ( pgw_in.servedimsi=pgw_oracle.servedimsi and 
 pgw_in.servedimeisv=pgw_oracle.servedimeisv and 
 pgw_in.servedmsisdn=pgw_oracle.servedmsisdn)
 or ( pgw_oracle.servedimeisv="null" and 
 pgw_in.servedimsi=pgw_oracle.servedimsi and 
 pgw_oracle.servedmsisdn=pgw_in.servedmsisdn)
 or ( pgw_oracle.servedimeisv=pgw_in.servedimeisv and 
 pgw_oracle.servedimsi="null" and 
 pgw_oracle.servedmsisdn=pgw_in.servedmsisdn)
 or ( pgw_oracle.servedimsi="null" and pgw_oracle.servedimeisv="null" and 
 pgw_oracle.servedmsisdn=pgw_in.servedmsisdn))) 

Tables - pgw_in,pgw_oracle.
The script runs slowly, about 15 minutes to process data.

4 REPLIES 4

Re: How you can increase the speed of data processing

Cloudera Employee
Please upload a query profile.

Re: How you can increase the speed of data processing

New Contributor
Explain String

  Explain String

1Estimated Per-Host Requirements: Memory=1.88GB VCores=2
2WARNING: The following tables are missing relevant table and/or column statistics.
3pgw.pgw_in, pgw.pgw_oracle
4 
5PLAN-ROOT SINK
6|
704:EXCHANGE [UNPARTITIONED]
8|
902:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
10|  predicates: ((CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 9), 'yyyy-M-dd') AS TIMESTAMP) >= pgw_oracle.firstdatetime AND CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 9), 'yyyy-M-dd') AS TIMESTAMP) <= pgw_oracle.seconddatetime) OR (CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 10), 'yyyy-M-dd') AS TIMESTAMP) >= pgw_oracle.firstdatetime AND CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 10), 'yyyy-M-dd') AS TIMESTAMP) <= pgw_oracle.seconddatetime)), (pgw_in.servedimsi = pgw_oracle.servedimsi AND ((pgw_in.servedimeisv = pgw_oracle.servedimeisv AND pgw_in.servedmsisdn = pgw_oracle.servedmsisdn) OR (pgw_oracle.servedimeisv = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn)) OR (pgw_oracle.servedimeisv = pgw_in.servedimeisv AND pgw_oracle.servedimsi = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn) OR (pgw_oracle.servedimsi = 'null' AND pgw_oracle.servedimeisv = 'null' AND pgw_oracle.servedmsisdn = pgw_in.servedmsisdn))
11|
12|--03:EXCHANGE [BROADCAST]
13|  |
14|  01:SCAN HDFS [pgw.pgw_oracle]
15|     partitions=1/1 files=0 size=0B
16|
1700:SCAN HDFS [pgw.pgw_in]
18   partitions=1/1 files=677 size=648.06GB

Re: How you can increase the speed of data processing

Explorer
Partitioning may help if entire table scan is not required, also some processing can be saved instead of (CAST(unix_timestamp(substr(pgw_in.timeoffirstusage, 2, 9), 'yyyy-M-dd') AS TIMESTAMP
convert pgw_oracle.firstdatetime into unix_timestamp.

Re: How you can increase the speed of data processing

Explorer
Apply:

INVALIDATE METADATA pgw.pgw_in;
ALTER TABLE pgw.pgw_in RECOVER PARTITIONS;
COMPUTE STATS pgw.pgw_in;

(WARNING: The following tables are missing relevant table and/or column statistics.)