Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I do suffer a lot because Impalad doesn't see metadata updates

Solved Go to solution

I do suffer a lot because Impalad doesn't see metadata updates

Expert Contributor

Hi, we use Impala 1.1.1

We use JDBC to submit query to Impala.

We submit query to partitioned table.

Oozie coordinator does wake up each hour, parses data and adds new patition to table: alter table add partition...

 

When we do get connection JDBC to random Impalad (chosen from predefined list) in our Java code, we do submit "refresh my_partitioned_table" and then give connection to user.

 

We suppose that "refresh some_Table" forces Imapad to get actual metadata info about target table from metastore.

But it's not true.

Here it is:

 

1. Let's refresh metadata

Returned 0 row(s) in 3.38s
[localhost:21000] > refresh web_resource_rating;
Query: refresh web_resource_rating
Query finished, fetching results ...

2. Let's check the result. We query "virtual partition" column

localhost:21000] > select distinct fulldate from web_resource_rating order by fulldate desc limit 20;
Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate      |
+---------------+
| 2013-09-25-10 |
| 2013-09-25-09 |
| 2013-09-25-08 |
| 2013-09-25-07 |
| 2013-09-25-06 |
| 2013-09-25-05 |
| 2013-09-25-04 |
| 2013-09-25-03 |
| 2013-09-25-02 |
| 2013-09-25-01 |
| 2013-09-25-00 |
| 2013-09-24-23 |
| 2013-09-24-22 |
| 2013-09-24-21 |
| 2013-09-24-20 |
| 2013-09-24-19 |
| 2013-09-24-18 |
| 2013-09-24-17 |
| 2013-09-24-16 |
| 2013-09-24-15 |
+---------------+

OMG, Where is last day????

 

Let's invalidate metadata (I've already did it last day...)

[localhost:21000] > invalidate metadata;
Query: invalidate metadata
Query finished, fetching results ...

 

Le'ts repeat the query

Query: select distinct fulldate from web_resource_rating order by fulldate desc limit 20
Query finished, fetching results ...
+---------------+
| fulldate      |
+---------------+
| 2013-09-26-13 |
| 2013-09-26-12 |
| 2013-09-26-11 |
| 2013-09-26-10 |
| 2013-09-26-09 |
| 2013-09-26-08 |
| 2013-09-26-07 |
| 2013-09-26-06 |
| 2013-09-26-05 |
| 2013-09-26-04 |
| 2013-09-26-03 |
| 2013-09-26-02 |
| 2013-09-26-01 |
| 2013-09-26-00 |
| 2013-09-25-23 |
| 2013-09-25-22 |
| 2013-09-25-21 |
| 2013-09-25-20 |
| 2013-09-25-19 |
| 2013-09-25-18 |
+---------------+

 

Great now I see the latest partitions.

 

What do I do wrong? Why refresh doesn't help?

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: I do suffer a lot because Impalad doesn't see metadata updates

Contributor

I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.

This can be automated as part of data load process.

Re: I do suffer a lot because Impalad doesn't see metadata updates

Master Collaborator

We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.

Sorry for the inconvenience.

5 REPLIES 5

Re: I do suffer a lot because Impalad doesn't see metadata updates

Cloudera Employee

That is a good point, we'll document better the interaction with partitioned tables. It seems that the new partition is a change in the table metadata, and in 1.1.1 REFRESH only fetches the block location data (for the partitions that are already known). This is definitely an area where improvements are being worked, so anything I say about the internals is likely to be obsolete soon...

Re: I do suffer a lot because Impalad doesn't see metadata updates

Expert Contributor

Reply by email didn't work so I coy-paste my second question using web-form:

 


---Insert Reply Above This Line---

So, the solution is:
Issue "invalidate metadata" on target impalad node before quering partitioned table?

Re: I do suffer a lot because Impalad doesn't see metadata updates

Contributor

I'd recommend - run "invalidate metadata" on all impalad immediately after loading new data or modifying metadata.

This can be automated as part of data load process.

Re: I do suffer a lot because Impalad doesn't see metadata updates

Master Collaborator

We're working on automatic metadata-update dissemination to hopefully ease your pain, Sergey. Stay tuned.

Sorry for the inconvenience.

Highlighted

Re: I do suffer a lot because Impalad doesn't see metadata updates

Expert Contributor
We do it using oozie coordinator which runc workflow: parse data, add partition, invalidate metadata.
And we met another problem:)
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/INVALIDATE-METADATA-suddenly-caused-jav...

Right now it's not a tragedy, we are waiting for bug fixes.