Support Questions
Find answers, ask questions, and share your expertise

Impala cannot query the latest partition data after refreshing the hive table

Impala cannot query the latest partition data after refreshing the hive table

Explorer

Use impala-shell to refresh hive partition table:

 

Query: REFRESH hdm.hdpos_v_zddb_xsmx
Query submitted at: 2020-12-10 05:07:51 (Coordinator: http://realtimeanalysis-kudutablet01:25000)
Query progress can be monitored at: http://realtimeanalysis-kudutablet01:25000/query_plan?query_id=4541f98e8a7de9eb:2ab15a3400000000
Fetched 0 row(s) in 0.12s

 

Then use jdbc connection query, no latest data

 

Max Per-Host Resource Reservation: Memory=68.00MB Threads=3
Per-Host Resource Estimates: Memory=256MB
WARNING: The following tables are missing relevant table and/or column statistics.
hdm.hdpos_v_zddb_xsmx
Analyzed query: SELECT * FROM (SELECT accdate, busno, compid, wareid,
sum(wareqty) wareqty, sum(nvl(netsum, CAST(0 AS DECIMAL(16,4)))) netsum,
sum(nvl(stdsum, CAST(0 AS DECIMAL(16,4)))) stdsum, sum(nvl(puramount, CAST(0 AS
DECIMAL(16,4)))) puramount FROM hdm.hdpos_v_zddb_xsmx WHERE dt = '20201209' AND
accdate = TIMESTAMP '2020-12-09 00:00:00' GROUP BY accdate, busno, compid,
wareid) a WHERE a.wareqty > CAST(0 AS DECIMAL(3,0))

F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=108.00KB mem-reservation=0B thread-reservation=1
PLAN-ROOT SINK
|  mem-estimate=0B mem-reservation=0B thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
|  mem-estimate=108.00KB mem-reservation=0B thread-reservation=0
|  tuple-ids=1 row-size=104B cardinality=0
|  in pipelines: 03(GETNEXT)
|
F01:PLAN FRAGMENT [HASH(accdate,busno,compid,wareid)] hosts=1 instances=1
Per-Host Resources: mem-estimate=128.11MB mem-reservation=34.00MB thread-reservation=1
03:AGGREGATE [FINALIZE]
|  output: sum:merge(wareqty), sum:merge(nvl(netsum, 0)), sum:merge(nvl(stdsum, 0)), sum:merge(nvl(puramount, 0))
|  group by: accdate, busno, compid, wareid
|  having: sum(wareqty) > CAST(0 AS DECIMAL(3,0))
|  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
|  tuple-ids=1 row-size=104B cardinality=0
|  in pipelines: 03(GETNEXT), 00(OPEN)
|
02:EXCHANGE [HASH(accdate,busno,compid,wareid)]
|  mem-estimate=108.00KB mem-reservation=0B thread-reservation=0
|  tuple-ids=1 row-size=104B cardinality=0
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=128.00MB mem-reservation=34.00MB thread-reservation=1
01:AGGREGATE [STREAMING]
|  output: sum(wareqty), sum(nvl(netsum, CAST(0 AS DECIMAL(16,4)))), sum(nvl(stdsum, CAST(0 AS DECIMAL(16,4)))), sum(nvl(puramount, CAST(0 AS DECIMAL(16,4))))
|  group by: accdate, busno, compid, wareid
|  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
|  tuple-ids=1 row-size=104B cardinality=0
|  in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [hdm.hdpos_v_zddb_xsmx, RANDOM]
   partition predicates: dt = '20201209'
   partitions=0/155 files=0 size=0B
   predicates: accdate = TIMESTAMP '2020-12-09 00:00:00'
   stored statistics:
     table: rows=unavailable size=unavailable
     partitions: 0/0 rows=unavailable
     columns missing stats: accdate, busno, compid, wareid, wareqty, netsum, stdsum, puramount
   extrapolated-rows=disabled max-scan-range-rows=0
   mem-estimate=0B mem-reservation=0B thread-reservation=0
   tuple-ids=0 row-size=72B cardinality=0
   in pipelines: 00(GETNEXT)

 

 I checked when hdfs generates the partition table before refreshing the partition.

 how can I solve this problem, please give suggestions, thanks!

2 REPLIES 2

Re: Impala cannot query the latest partition data after refreshing the hive table

Contributor

Hi,

 

I am not sure above issue is resolved or not, As per the issue description after REFRESH you are not able to query the table partition in impala but as per the query details you have given i do not see an issue but it just says stats are missing and your query might have succeeded ( I do not know the query status since you haven't given in the details)

If partition is not available the query usually fails with "Analysis Exception".

If you have such issue you can upload full query profile and its coordinator logs to check the issue.

 

Re: Impala cannot query the latest partition data after refreshing the hive table

Contributor

Also should check the output of the query from both hive and Impala, If there is any issues with metadata update after REFRESH we need to check the catalog logs at the same time...