Support Questions

Find answers, ask questions, and share your expertise

Impala view based on UDF

avatar
Rising Star

we have a beeline process that inserts data into a hive table

we can only access the table via impala views

so when the beeline job is done,

we want to refresh the tables and select rows from the table via impala views

could we write a user defined function that

1. refreshes the table in impala

2. selects * from the refreshed table

 

call that UDF from an impala view

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @ChineduLB,

UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.

A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:

https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:

https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/

 

View solution in original post

2 REPLIES 2

avatar
Rising Star

So basically a situation where Hive is updating a table and impala clients are querying same table.

 

Sometimes the impala queries throw missing hdfs files exception.

 

How do we handle this

avatar
Expert Contributor

Hi @ChineduLB,

UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.

A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:

https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/

 

Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:

https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/