- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What does bucket_00003_flush_length indicate while writing to Hive
- Labels:
-
Apache Hive
Created ‎04-18-2017 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have been attempting to Stream data in to Hive Tables using Storm Trident topology. There are cases where we see multiple files of type bucket_XXXX_flush_length in hive file explorer. what does these files Indicate ..? and when will they occur...?
Created ‎04-19-2017 07:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The short answer is you can ignore these.
When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.
Created ‎04-18-2017 06:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎04-18-2017 01:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe my question is more related to why these files occur in first place and the significance of them .i see many of them when we are trying to stream the data in to hive. My a question is not Not related to how they can be compacted as mentioned in the link you provided.
Created ‎04-19-2017 07:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The short answer is you can ignore these.
When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.
