Created 04-18-2017 01:58 AM
We have been attempting to Stream data in to Hive Tables using Storm Trident topology. There are cases where we see multiple files of type bucket_XXXX_flush_length in hive file explorer. what does these files Indicate ..? and when will they occur...?
Created 04-19-2017 07:43 PM
The short answer is you can ignore these.
When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.
Created 04-18-2017 06:07 AM
Created 04-18-2017 01:19 PM
I believe my question is more related to why these files occur in first place and the significance of them .i see many of them when we are trying to stream the data in to hive. My a question is not Not related to how they can be compacted as mentioned in the link you provided.
Created 04-19-2017 07:43 PM
The short answer is you can ignore these.
When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.