Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala view based on UDF

Solved Go to solution
Highlighted

Impala view based on UDF

Explorer

we have a beeline process that inserts data into a hive table

we can only access the table via impala views

so when the beeline job is done,

we want to refresh the tables and select rows from the table via impala views

could we write a user defined function that

1. refreshes the table in impala

2. selects * from the refreshed table

 

call that UDF from an impala view

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Impala view based on UDF

Cloudera Employee

Hi @ChineduLB,

UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.

A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:

https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:

https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/

 

2 REPLIES 2

Re: Impala view based on UDF

Explorer

So basically a situation where Hive is updating a table and impala clients are querying same table.

 

Sometimes the impala queries throw missing hdfs files exception.

 

How do we handle this

Re: Impala view based on UDF

Cloudera Employee

Hi @ChineduLB,

UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.

A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:

https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:

https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/

 

Don't have an account?
Coming from Hortonworks? Activate your account here