Support Questions

Find answers, ask questions, and share your expertise

I do suffer a lot because Impalad doesn't see metadata updates

avatar
Expert Contributor

Hi, we use Impala 1.1.1

We use JDBC to submit query to Impala.

We submit query to partitioned table.

Oozie coordinator does wake up each hour, parses data and adds new patition to table: alter table add partition...

 

When we do get connection JDBC to random Impalad (chosen from predefined list) in our Java code, we do submit "refresh my_partitioned_table" and then give connection to user.

 

We suppose that "refresh some_Table" forces Imapad to get actual metadata info about target table from metastore.

But it's not true.

Here it is:

 

1. Let's refresh metadata

Returned 0 row(s) in 3.38s
[localhost:21000] > refresh web_resource_rating;
Query: refresh web_resource_rating
Query finished, fetching results ...

2. Let's check the result. We query "virtual partition" column

localhost:21000] > select distinct fulldate from web_resource_rating order by fulldate desc limit 20;
Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate      |
+---------------+
| 2013-09-25-10 |
| 2013-09-25-09 |
| 2013-09-25-08 |
| 2013-09-25-07 |
| 2013-09-25-06 |
| 2013-09-25-05 |
| 2013-09-25-04 |
| 2013-09-25-03 |
| 2013-09-25-02 |
| 2013-09-25-01 |
| 2013-09-25-00 |
| 2013-09-24-23 |
| 2013-09-24-22 |
| 2013-09-24-21 |
| 2013-09-24-20 |
| 2013-09-24-19 |
| 2013-09-24-18 |
| 2013-09-24-17 |
| 2013-09-24-16 |
| 2013-09-24-15 |
+---------------+

OMG, Where is last day????

 

Let's invalidate metadata (I've already did it last day...)

[localhost:21000] > invalidate metadata;
Query: invalidate metadata
Query finished, fetching results ...

 

Le'ts repeat the query

Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate      |
+---------------+
| 2013-09-26-13 |
| 2013-09-26-12 |
| 2013-09-26-11 |
| 2013-09-26-10 |
| 2013-09-26-09 |
| 2013-09-26-08 |
| 2013-09-26-07 |
| 2013-09-26-06 |
| 2013-09-26-05 |
| 2013-09-26-04 |
| 2013-09-26-03 |
| 2013-09-26-02 |
| 2013-09-26-01 |
| 2013-09-26-00 |
| 2013-09-25-23 |
| 2013-09-25-22 |
| 2013-09-25-21 |
| 2013-09-25-20 |
| 2013-09-25-19 |
| 2013-09-25-18 |
+---------------+

 

Great now I see the latest partitions.

 

What do I do wrong? Why refresh doesn't help?

 

2 ACCEPTED SOLUTIONS

avatar
Rising Star

I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.

This can be automated as part of data load process.

View solution in original post

avatar

We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.

Sorry for the inconvenience.

View solution in original post

5 REPLIES 5

avatar
Contributor

That is a good point, we'll document better the interaction with partitioned tables. It seems that the new partition is a change in the table metadata, and in 1.1.1 REFRESH only fetches the block location data (for the partitions that are already known). This is definitely an area where improvements are being worked, so anything I say about the internals is likely to be obsolete soon...

avatar
Expert Contributor

Reply by email didn't work so I coy-paste my second question using web-form:

 


---Insert Reply Above This Line---

So, the solution is:
Issue "invalidate metadata" on target impalad node before quering partitioned table?

avatar
Rising Star

I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.

This can be automated as part of data load process.

avatar

We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.

Sorry for the inconvenience.

avatar
Expert Contributor
We do it using oozie coordinator which runc workflow: parse data, add partition, invalidate metadata.
And we met another problem:)
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/INVALIDATE-METADATA-suddenly-caused-jav...

Right now it's not a tragedy, we are waiting for bug fixes.