Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

After Impala Refresh Metadata is still stale

Highlighted

After Impala Refresh Metadata is still stale

Explorer

I have an Oozie Workflow where I am have a job job which loads some data into a table, I refresh the table in Impala and then have an Impala query to export the most recent data in this table to a CSV File.

 

My Problem is that even after doing the Impala refresh I do not get the most recent data only the data for the previous load.

 

For Example I have a process that starts running at 1pm spark job finishes at 1:15pm impala refresh is executed 1:20pm then at 1:25 my query to export the data runs but it only shows the data for the previous workflow which run at 12pm and not the data for the workflow which ran at 1pm.

 

I am using Oozie and cdh 5.15.1.

 

Sample Warning Message Read 972.32 MB of data across network that was expected to be local. Block locality metadata for table '..' may be stale. Consider running "INVALIDATE

METADATA ...

Thanks

3 REPLIES 3

Re: After Impala Refresh Metadata is still stale

Guru
@gimp077 ,

When you say you did "REFRESH" the table, did you run "REFRESH <tablename>" or "INVALIDATE METADATA", because those two are not identical in the way they work.

Is your table partitioned? If yes, can you see the new partition from Impala by running 'SHOW PARTITIONS <tablename>"?

Cheers
Eric

Re: After Impala Refresh Metadata is still stale

Explorer

Hi Eric,

 

My table is partitioned I was expecting that after I do a refresh on the table I would see the most recent data in the table.

 

However sometimes there is a lag from when the refresh completes to when I see the most recent data.

I think invalidate metadata would fix this issue but it will be costly to run on a large table.

 

Thanks

Re: After Impala Refresh Metadata is still stale

Guru
@gimp077 ,

Did you mean that "REFRESH" takes time, and eventually you can see the update data, but just some delay?

How big is the table? I mean in terms of number of partitions and number of files in HDFS?

Eric
Don't have an account?
Coming from Hortonworks? Activate your account here