Created on 09-26-2013 04:00 AM - edited 09-16-2022 01:48 AM
Hi, we use Impala 1.1.1
We use JDBC to submit query to Impala.
We submit query to partitioned table.
Oozie coordinator does wake up each hour, parses data and adds new patition to table: alter table add partition...
When we do get connection JDBC to random Impalad (chosen from predefined list) in our Java code, we do submit "refresh my_partitioned_table" and then give connection to user.
We suppose that "refresh some_Table" forces Imapad to get actual metadata info about target table from metastore.
But it's not true.
Here it is:
1. Let's refresh metadata
Returned 0 row(s) in 3.38s
[localhost:21000] > refresh web_resource_rating;
Query: refresh web_resource_rating
Query finished, fetching results ...
2. Let's check the result. We query "virtual partition" column
localhost:21000] > select distinct fulldate from web_resource_rating order by fulldate desc limit 20;
Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate |
+---------------+
| 2013-09-25-10 |
| 2013-09-25-09 |
| 2013-09-25-08 |
| 2013-09-25-07 |
| 2013-09-25-06 |
| 2013-09-25-05 |
| 2013-09-25-04 |
| 2013-09-25-03 |
| 2013-09-25-02 |
| 2013-09-25-01 |
| 2013-09-25-00 |
| 2013-09-24-23 |
| 2013-09-24-22 |
| 2013-09-24-21 |
| 2013-09-24-20 |
| 2013-09-24-19 |
| 2013-09-24-18 |
| 2013-09-24-17 |
| 2013-09-24-16 |
| 2013-09-24-15 |
+---------------+
OMG, Where is last day????
Let's invalidate metadata (I've already did it last day...)
[localhost:21000] > invalidate metadata;
Query: invalidate metadata
Query finished, fetching results ...
Le'ts repeat the query
Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate |
+---------------+
| 2013-09-26-13 |
| 2013-09-26-12 |
| 2013-09-26-11 |
| 2013-09-26-10 |
| 2013-09-26-09 |
| 2013-09-26-08 |
| 2013-09-26-07 |
| 2013-09-26-06 |
| 2013-09-26-05 |
| 2013-09-26-04 |
| 2013-09-26-03 |
| 2013-09-26-02 |
| 2013-09-26-01 |
| 2013-09-26-00 |
| 2013-09-25-23 |
| 2013-09-25-22 |
| 2013-09-25-21 |
| 2013-09-25-20 |
| 2013-09-25-19 |
| 2013-09-25-18 |
+---------------+
Great now I see the latest partitions.
What do I do wrong? Why refresh doesn't help?
Created 10-16-2013 09:26 AM
I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.
This can be automated as part of data load process.
Created 10-16-2013 11:35 AM
We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.
Sorry for the inconvenience.
Created 09-27-2013 02:36 PM
That is a good point, we'll document better the interaction with partitioned tables. It seems that the new partition is a change in the table metadata, and in 1.1.1 REFRESH only fetches the block location data (for the partitions that are already known). This is definitely an area where improvements are being worked, so anything I say about the internals is likely to be obsolete soon...
Created 09-28-2013 09:47 AM
Reply by email didn't work so I coy-paste my second question using web-form:
Created 10-16-2013 09:26 AM
I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.
This can be automated as part of data load process.
Created 10-16-2013 11:35 AM
We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.
Sorry for the inconvenience.
Created 10-16-2013 11:25 PM