Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What does bucket_00003_flush_length indicate while writing to Hive

avatar
Explorer

We have been attempting to Stream data in to Hive Tables using Storm Trident topology. There are cases where we see multiple files of type bucket_XXXX_flush_length in hive file explorer. what does these files Indicate ..? and when will they occur...?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

The short answer is you can ignore these.

When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.

View solution in original post

3 REPLIES 3

avatar

@Pardhu T

You might have to check this Link. Its related to the ticket here that you might want to look at.

avatar
Explorer

I believe my question is more related to why these files occur in first place and the significance of them .i see many of them when we are trying to stream the data in to hive. My a question is not Not related to how they can be compacted as mentioned in the link you provided.

avatar
Super Collaborator

The short answer is you can ignore these.

When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.