UPDATE: I'm wondering if this is happening on deletion, not creation. There is a describe that runs before the external table creation and it's throwing the same error. Checking binlogs in mysql for any clues.
UPDATE2: After reading the binlog, I am more convinced the DROP TABLE <TABLE> cmd for hive isn't complete correctly and isn't delete from TBLS.
This is a behavior I've experienced frequently in Hive 3, that occurred almost never in Hive.
Hive Version: 188.8.131.52.1
Metastore is mysql.
As part of an incremental data sqoop, I create two tables:One external, one ORC.
Sometimes during the creation of these two tables, the metastore doesn't populate the the SERDES, SDS, CDS tables with the corresponding metadata and leaves the SD_ID null in the TBLS table. The table remain non-interactive (not even droppable) until I populate dummy data in the corresponding tables and update the TBLS's SD_ID field.
The statements for table creation are as such:
create external table if not exists <DIFF EXTERNAL TABLE NAME> like <MAIN EXTERNAL TABLE NAME>;
create table <DIFF ORC TABLE> stored as orc as select * from <DIFF EXTERNAL TABLE NAME>;
The logs in the metastore are a bit unclear:
2019-11-13T04:08:07,696 ERROR [pool-6-thread-187]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - java.lang.NullPointerException 2019-11-13T04:08:07,696 ERROR [pool-6-thread-187]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message.
And the logs in the Hiveserver2 logs only highlight after the issue has happened.
An external table is not managed by Hive it only describes the metadata/schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as CSV's, S3, Azure Storage or remote HDFS locations.
When you drop /delete the external table you are ONLY invalidating the metadata but the underlying data in S3 or CSV file is not deleted as opposed to a managed table.
Please read create, use, and drop an external table it will give you a better explanation
@Shelton Hi thank you for the response.
To clarify, I'm not talking about the data the tables are displaying. I'm talking about the actual metastore tables (TBLS, CDS, SDS, SERDES, etc.). These tables describe the table structure and if there is something wrong with the metadata, the tables won't function accordingly.
The issue, I think, is when a table is dropped, the sql cmds to the metastore are fully executing (Not deleting from TBLS, but leaving the SD_ID null.
The concern is the metadata itself, not the underlining csv/flatfile.