Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What does bucket_00003_flush_length indicate while writing to Hive

Solved Go to solution
Highlighted

What does bucket_00003_flush_length indicate while writing to Hive

Explorer

We have been attempting to Stream data in to Hive Tables using Storm Trident topology. There are cases where we see multiple files of type bucket_XXXX_flush_length in hive file explorer. what does these files Indicate ..? and when will they occur...?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What does bucket_00003_flush_length indicate while writing to Hive

Expert Contributor

The short answer is you can ignore these.

When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.

View solution in original post

3 REPLIES 3
Highlighted

Re: What does bucket_00003_flush_length indicate while writing to Hive

@Pardhu T

You might have to check this Link. Its related to the ticket here that you might want to look at.

Highlighted

Re: What does bucket_00003_flush_length indicate while writing to Hive

Explorer

I believe my question is more related to why these files occur in first place and the significance of them .i see many of them when we are trying to stream the data in to hive. My a question is not Not related to how they can be compacted as mentioned in the link you provided.

Re: What does bucket_00003_flush_length indicate while writing to Hive

Expert Contributor

The short answer is you can ignore these.

When you are using Hive Streaming Ingest (https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) which is used by Storm/Flume/NiFi, Hive creates these file for it's internal housekeeping for maintaining transactional consistency. They should normally be removed as soon as the TransactionBatch is closed (usually once transaction Y of delta_X_Y/ finishes). The flush_legth file may remain around if the Writer process crashes before TransactionBatch is closed. They will eventually be cleaned by the Compactor process.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here