Created on 10-04-2019 09:42 PM - last edited on 10-05-2019 07:27 AM by ask_bill_brooks
we have a beeline process that inserts data into a hive table
we can only access the table via impala views
so when the beeline job is done,
we want to refresh the tables and select rows from the table via impala views
could we write a user defined function that
1. refreshes the table in impala
2. selects * from the refreshed table
call that UDF from an impala view
Created 10-17-2019 01:26 AM
Hi @ChineduLB,
UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.
A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:
https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/
Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:
https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/
Created 10-05-2019 12:07 PM
So basically a situation where Hive is updating a table and impala clients are querying same table.
Sometimes the impala queries throw missing hdfs files exception.
How do we handle this
Created 10-17-2019 01:26 AM
Hi @ChineduLB,
UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing.
A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches:
https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/
Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details:
https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/