Support Questions
Find answers, ask questions, and share your expertise

“file does not exist” exception occured when using spark dynamic partition pruning and small table is empty

New Contributor

1. Issue summary:

I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to other versions too.

When inner join tableA to tableB on partition key of tableB, if dynamic partition pruning is enabled and tableA is emplty, the query will fail with below exception: 

Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1 does not exist. (state=42000,code=3).

2. Steps to reproduce the issue:

 

 

1. prepare tables:
CREATE TABLE tableA (                             
   businsys_no decimal(10,0),                     
   acct_id string,                                                   
   prod_code string)  
PARTITIONED BY (init_date int)       
stored as orc;                               
CREATE TABLE tableB (      
   client_id string,                              
   open_date decimal(10,0),                       
   client_status string,                          
   organ_flag string)                                           
 PARTITIONED BY (businsys_no decimal(10,0))  
stored as orc;    

2. prepare data for tables:

 – tableA should be emplty;
– prepare some data for tableB

3. run below steps to reproduce the issue:

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.spark.dynamic.partition.pruning.map.join.only=true;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;

 

 

3. execution error logs

 

 

WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://localhost:10000/lv_test
Connected to: Apache Hive (version 2.1.1-cdh6.3.2)
Driver: Hive JDBC (version 2.1.1-cdh6.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/lv_test> set hive.execution.engine=spark;
No rows affected (0.044 seconds)
0: jdbc:hive2://localhost:10000/lv_test> set hive.auto.convert.join=true;
No rows affected (0.004 seconds)
0: jdbc:hive2://localhost:10000/lv_test> set hive.spark.dynamic.partition.pruning=true;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/lv_test> set hive.spark.dynamic.partition.pruning.map.join.only=true;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/lv_test>  select *
. . . . . . . . . . . . . . . . . . . .>       from (select *
. . . . . . . . . . . . . . . . . . . .>             from ods_img_gj_sed_prod_liming fp
. . . . . . . . . . . . . . . . . . . .>            where fp.part_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
. . . . . . . . . . . . . . . . . . . .>                  from ods_temp_cdt_increment_client ic where ic.businsys_no=5035) ici on cfp.businsys_no =  
ici.businsys_no and cfp.acct_id = ici.client_id;
INFO  : Compiling command(queryId=hive_20220601001521_02b9c045-ef56-4fbf-9006-114e56c49497): select *
from (select *
from ods_img_gj_sed_prod_liming fp
where fp.part_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
from ods_temp_cdt_increment_client ic where ic.businsys_no=5035) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:cfp.init_date, type:decimal(10,0), comment:null), FieldSchema(name:cfp.businsys_no, type:decimal(10,0), comment:null), FieldSchema(name:cfp.acct_id, type:string, comment:null), FieldSchema(name:cfp.organ_prod_kind, type:string, comment:null), FieldSchema(name:cfp.prod_code, type:string, comment:null), FieldSchema(name:cfp.prod_full_name, type:string, comment:null), FieldSchema(name:cfp.prod_name, type:string, comment:null), FieldSchema(name:cfp.prod_enddate, type:decimal(10,0), comment:null), FieldSchema(name:cfp.prod_structure, type:string, comment:null), FieldSchema(name:cfp.prodearnings_character, type:string, comment:null), FieldSchema(name:cfp.prod_scale, type:decimal(15,4), comment:null), FieldSchema(name:cfp.prodpossessor_type, type:string, comment:null), FieldSchema(name:cfp.initleverage_ratio, type:string, comment:null), FieldSchema(name:cfp.finan_futures_account, type:string, comment:null), FieldSchema(name:cfp.prodmana_code, type:string, comment:null), FieldSchema(name:cfp.product_open_period, type:string, comment:null), FieldSchema(name:cfp.product_net_value, type:decimal(15,8), comment:null), FieldSchema(name:cfp.product_nav_date, type:decimal(10,0), comment:null), FieldSchema(name:cfp.remark, type:string, comment:null), FieldSchema(name:cfp.custodian_tel, type:string, comment:null), FieldSchema(name:cfp.prod_register_date, type:decimal(10,0), comment:null), FieldSchema(name:cfp.agency_name, type:string, comment:null), FieldSchema(name:cfp.prod_cust_type, type:string, comment:null), FieldSchema(name:cfp.investadv_flag, type:string, comment:null), FieldSchema(name:cfp.pfunds_type, type:string, comment:null), FieldSchema(name:cfp.pfunds_manage_type, type:string, comment:null), FieldSchema(name:cfp.product_open_info, type:string, comment:null), FieldSchema(name:cfp.pfunds_type_info, type:string, comment:null), FieldSchema(name:cfp.prodtrustee_net_no, type:string, comment:null), FieldSchema(name:cfp.trustee_bank_name, type:string, comment:null), FieldSchema(name:cfp.trustee_bank_account, type:string, comment:null), FieldSchema(name:cfp.trust_fund, type:decimal(15,2), comment:null), FieldSchema(name:cfp.trust_begindate, type:decimal(10,0), comment:null), FieldSchema(name:cfp.trust_enddate, type:decimal(10,0), comment:null), FieldSchema(name:cfp.subscriber_extracode, type:string, comment:null), FieldSchema(name:cfp.extracode, type:string, comment:null), FieldSchema(name:cfp.part_date, type:int, comment:null), FieldSchema(name:ici.client_id, type:string, comment:null), FieldSchema(name:ici.businsys_no, type:decimal(10,0), comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20220601001521_02b9c045-ef56-4fbf-9006-114e56c49497); Time taken: 0.168 seconds
INFO  : Executing command(queryId=hive_20220601001521_02b9c045-ef56-4fbf-9006-114e56c49497): select *
from (select *
from ods_img_gj_sed_prod_liming fp
where fp.part_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
from ods_temp_cdt_increment_client ic where ic.businsys_no=5035) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id
INFO  : Query ID = hive_20220601001521_02b9c045-ef56-4fbf-9006-114e56c49497
INFO  : Total jobs = 2
INFO  : Launching Job 1 out of 2
INFO  : Starting task [Stage-2:MAPRED] in serial mode
INFO  : 2022-06-01 00:15:35,219	
INFO  : Spark job[0] finished successfully in 2.01 second(s)
INFO  : Launching Job 2 out of 2
INFO  : Starting task [Stage-1:MAPRED] in serial mode
ERROR : Spark job[-1] failed
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive/anonymous/744d8b8a-7a19-4337-a80c-9f2005f65fb5/hive_2022-06-01_00-15-21_619_1701186596611571845-7/-mr-10003/2/1 does not exist.
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.runDynamicPartitionPruner(SparkPlanGenerator.java:162) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:127) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive/anonymous/744d8b8a-7a19-4337-a80c-9f2005f65fb5/hive_2022-06-01_00-15-21_619_1701186596611571845-7/-mr-10003/2/1 does not exist.
	at org.apache.hadoop.hive.ql.exec.spark.SparkDynamicPartitionPruner.processFiles(SparkDynamicPartitionPruner.java:161) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.SparkDynamicPartitionPruner.prune(SparkDynamicPartitionPruner.java:83) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.runDynamicPartitionPruner(SparkPlanGenerator.java:160) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	... 8 more
Caused by: java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive/anonymous/744d8b8a-7a19-4337-a80c-9f2005f65fb5/hive_2022-06-01_00-15-21_619_1701186596611571845-7/-mr-10003/2/1 does not exist.
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:986) ~[hadoop-hdfs-client-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:122) ~[hadoop-hdfs-client-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1046) ~[hadoop-hdfs-client-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1043) ~[hadoop-hdfs-client-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1053) ~[hadoop-hdfs-client-3.0.0-cdh6.3.2.jar:?]
	at org.apache.hadoop.hive.ql.exec.spark.SparkDynamicPartitionPruner.processFiles(SparkDynamicPartitionPruner.java:133) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.SparkDynamicPartitionPruner.prune(SparkDynamicPartitionPruner.java:83) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.runDynamicPartitionPruner(SparkPlanGenerator.java:160) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
	... 8 more
ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/anonymous/744d8b8a-7a19-4337-a80c-9f2005f65fb5/hive_2022-06-01_00-15-21_619_1701186596611571845-7/-mr-10003/2/1 does not exist.
INFO  : Completed executing command(queryId=hive_20220601001521_02b9c045-ef56-4fbf-9006-114e56c49497); Time taken: 14.496 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/anonymous/744d8b8a-7a19-4337-a80c-9f2005f65fb5/hive_2022-06-01_00-15-21_619_1701186596611571845-7/-mr-10003/2/1 does not exist. (state=42000,code=3)

 

 

4. execution plan 

 

 

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-2 is a root stage                          |
|   Stage-1 depends on stages: Stage-2               |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-2                                   |
|     Spark                                          |
|       DagName: hive_20220601001014_23adb610-480f-4462-9102-ea047b0b4db6:14 |
|       Vertices:                                    |
|         Map 1                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: fp                        |
|                   filterExpr: ((part_date = 20220525) and businsys_no is not null and acct_id is not null) (type: boolean) |
|                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                   GatherStats: false               |
|                   Filter Operator                  |
|                     isSamplingPred: false          |
|                     predicate: ((part_date = 20220525) and businsys_no is not null and acct_id is not null) (type: boolean) |
|                     Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                     Select Operator                |
|                       expressions: init_date (type: decimal(10,0)), businsys_no (type: decimal(10,0)), acct_id (type: string), organ_prod_kind (type: string), prod_code (type: string), prod_full_name (type: string), prod_name (type: string), prod_enddate (type: decimal(10,0)), prod_structure (type: string), prodearnings_character (type: string), prod_scale (type: decimal(15,4)), prodpossessor_type (type: string), initleverage_ratio (type: string), finan_futures_account (type: string), prodmana_code (type: string), product_open_period (type: string), product_net_value (type: decimal(15,8)), product_nav_date (type: decimal(10,0)), remark (type: string), custodian_tel (type: string), prod_register_date (type: decimal(10,0)), agency_name (type: string), prod_cust_type (type: string), investadv_flag (type: string), pfunds_type (type: string), pfunds_manage_type (type: string), product_open_info (type: string), pfunds_type_info (type: string), prodtrustee_net_no (type: string), trustee_bank_name (type: string), trustee_bank_account (type: string), trust_fund (type: decimal(15,2)), trust_begindate (type: decimal(10,0)), trust_enddate (type: decimal(10,0)), subscriber_extracode (type: string), extracode (type: string) |
|                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35 |
|                       Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                       Spark HashTable Sink Operator |
|                         keys:                      |
|                           0 _col1 (type: decimal(10,0)), _col2 (type: string) |
|                           1 _col1 (type: decimal(10,0)), _col0 (type: string) |
|                         Position of Big Table: 1   |
|                       Select Operator              |
|                         expressions: _col1 (type: decimal(10,0)) |
|                         outputColumnNames: _col0   |
|                         Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                         Group By Operator          |
|                           keys: _col0 (type: decimal(10,0)) |
|                           mode: hash               |
|                           outputColumnNames: _col0 |
|                           Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                           Spark Partition Pruning Sink Operator |
|                             Target column: businsys_no (decimal(10,0)) |
|                             partition key expr: businsys_no |
|                             tmp Path: hdfs://nameservice1/tmp/hive/anonymous/a454cc3a-8fa5-4ca3-a0da-5e9c7a4007c8/hive_2022-06-01_00-10-14_841_5683361755105469310-4/-mr-10003/2/1 |
|                             Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: PARTIAL |
|                             target work: Map 2     |
|             Local Work:                            |
|               Map Reduce Local Work                |
|                                                    |
|   Stage: Stage-1                                   |
|     Spark                                          |
|       DagName: hive_20220601001014_23adb610-480f-4462-9102-ea047b0b4db6:13 |
|       Vertices:                                    |
|         Map 2                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: ic                        |
|                   filterExpr: client_id is not null (type: boolean) |
|                   Statistics: Num rows: 2 Data size: 594 Basic stats: COMPLETE Column stats: PARTIAL |
|                   GatherStats: false               |
|                   Filter Operator                  |
|                     isSamplingPred: false          |
|                     predicate: client_id is not null (type: boolean) |
|                     Statistics: Num rows: 2 Data size: 224 Basic stats: COMPLETE Column stats: PARTIAL |
|                     Select Operator                |
|                       expressions: client_id (type: string), businsys_no (type: decimal(10,0)) |
|                       outputColumnNames: _col0, _col1 |
|                       Statistics: Num rows: 2 Data size: 224 Basic stats: COMPLETE Column stats: PARTIAL |
|                       Map Join Operator            |
|                         condition map:             |
|                              Inner Join 0 to 1     |
|                         keys:                      |
|                           0 _col1 (type: decimal(10,0)), _col2 (type: string) |
|                           1 _col1 (type: decimal(10,0)), _col0 (type: string) |
|                         outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col37, _col38 |
|                         input vertices:            |
|                           0 Map 1                  |
|                         Position of Big Table: 1   |
|                         Statistics: Num rows: 1 Data size: 3932 Basic stats: COMPLETE Column stats: PARTIAL |
|                         Select Operator            |
|                           expressions: _col0 (type: decimal(10,0)), _col1 (type: decimal(10,0)), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: decimal(10,0)), _col8 (type: string), _col9 (type: string), _col10 (type: decimal(15,4)), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: decimal(15,8)), _col17 (type: decimal(10,0)), _col18 (type: string), _col19 (type: string), _col20 (type: decimal(10,0)), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col24 (type: string), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string), _col30 (type: string), _col31 (type: decimal(15,2)), _col32 (type: decimal(10,0)), _col33 (type: decimal(10,0)), _col34 (type: string), _col35 (type: string), 20220525 (type: int), _col37 (type: string), _col38 (type: decimal(10,0)) |
|                           outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38 |
|                           Statistics: Num rows: 1 Data size: 3932 Basic stats: COMPLETE Column stats: PARTIAL |
|                           File Output Operator     |
|                             compressed: false      |
|                             GlobalTableId: 0       |
|                             directory: hdfs://nameservice1/tmp/hive/anonymous/a454cc3a-8fa5-4ca3-a0da-5e9c7a4007c8/hive_2022-06-01_00-10-14_841_5683361755105469310-4/-mr-10000/.hive-staging_hive_2022-06-01_00-10-14_841_5683361755105469310-4/-ext-10001 |
|                             NumFilesPerFileSink: 1 |
|                             Statistics: Num rows: 1 Data size: 3932 Basic stats: COMPLETE Column stats: PARTIAL |
|                             Stats Publishing Key Prefix: hdfs://nameservice1/tmp/hive/anonymous/a454cc3a-8fa5-4ca3-a0da-5e9c7a4007c8/hive_2022-06-01_00-10-14_841_5683361755105469310-4/-mr-10000/.hive-staging_hive_2022-06-01_00-10-14_841_5683361755105469310-4/-ext-10001/ |
|                             table:                 |
|                                 input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
|                                 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
|                                 properties:        |
|                                   columns _col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13,_col14,_col15,_col16,_col17,_col18,_col19,_col20,_col21,_col22,_col23,_col24,_col25,_col26,_col27,_col28,_col29,_col30,_col31,_col32,_col33,_col34,_col35,_col36,_col37,_col38 |
|                                   columns.types decimal(10,0):decimal(10,0):string:string:string:string:string:decimal(10,0):string:string:decimal(15,4):string:string:string:string:string:decimal(15,8):decimal(10,0):string:string:decimal(10,0):string:string:string:string:string:string:string:string:string:string:decimal(15,2):decimal(10,0):decimal(10,0):string:string:int:string:decimal(10,0) |
|                                   escape.delim \   |
|                                   hive.serialization.extend.additional.nesting.levels true |
|                                   serialization.escape.crlf true |
|                                   serialization.format 1 |
|                                   serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
|                                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                             TotalFiles: 1          |
|                             GatherStats: false     |
|                             MultiFileSpray: false  |
|             Local Work:                            |
|               Map Reduce Local Work                |
|             Path -> Alias:                         |
|               hdfs://nameservice1/user/hive/warehouse/lv_test.db/ods_temp_cdt_increment_client/businsys_no=5035 [ici:ic] |
|             Path -> Partition:                     |
|               hdfs://nameservice1/user/hive/warehouse/lv_test.db/ods_temp_cdt_increment_client/businsys_no=5035  |
|                 Partition                          |
|                   base file name: businsys_no=5035 |
|                   input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
|                   output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat |
|                   partition values:                |
|                     businsys_no 5035               |
|                   properties:                      |
|                     COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"} |
|                     bucket_count -1                |
|                     file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
|                     file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat |
|                     location hdfs://nameservice1/user/hive/warehouse/lv_test.db/ods_temp_cdt_increment_client/businsys_no=5035 |
|                     name lv_test.ods_temp_cdt_increment_client |
|                     numFiles 1                     |
|                     numRows 2                      |
|                     partition_columns businsys_no  |
|                     partition_columns.types decimal(10,0) |
|                     rawDataSize 370                |
|                     serialization.ddl struct ods_temp_cdt_increment_client { string client_id, decimal(10,0) open_date, string client_status, string organ_flag, decimal(10,0) corp_begin_date} |
|                     serialization.format 1         |
|                     serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde |
|                     serialization.null.format null |
|                     totalSize 516                  |
|                     transient_lastDdlTime 1653990973 |
|                   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde |
|                                                    |
|                     input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
|                     output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat |
|                     properties:                    |
|                       bucket_count -1              |
|                       columns client_id,open_date,client_status,organ_flag,corp_begin_date |
|                       columns.comments             |
|                       columns.types string:decimal(10,0):string:string:decimal(10,0) |
|                       file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat |
|                       file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat |
|                       last_modified_by hs_cic      |
|                       last_modified_time 1624128382 |
|                       location hdfs://nameservice1/user/hive/warehouse/lv_test.db/ods_temp_cdt_increment_client |
|                       name lv_test.ods_temp_cdt_increment_client |
|                       partition_columns businsys_no |
|                       partition_columns.types decimal(10,0) |
|                       serialization.ddl struct ods_temp_cdt_increment_client { string client_id, decimal(10,0) open_date, string client_status, string organ_flag, decimal(10,0) corp_begin_date} |
|                       serialization.format 1       |
|                       serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde |
|                       serialization.null.format null |
|                       transient_lastDdlTime 1624128382 |
|                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde |
|                     name: lv_test.ods_temp_cdt_increment_client |
|                   name: lv_test.ods_temp_cdt_increment_client |
|             Truncated Path -> Alias:               |
|               /lv_test.db/ods_temp_cdt_increment_client/businsys_no=5035 [ici:ic] |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+

 

 

5.  currently we turned off spark dynamic partition pruning to workaround this:

 

 

set hive.execution.engine=spark;
set hive.auto.convert.join=true;
set hive.spark.dynamic.partition.pruning=false;
set hive.spark.dynamic.partition.pruning.map.join.only=false;
select *
      from (select *
            from tableA fp
           where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no
                 from tableB ic) ici on cfp.businsys_no = ici.businsys_no
  and cfp.acct_id = ici.client_id;

 

 

0 REPLIES 0
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.