Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Optimizing max records inORC files

Optimizing max records inORC files

New Contributor


I have a process that will push rows to an external table, it shall push around 500M cdr per day.

the external tables is using orc file with the following config:

  • Max Records: 10K
  • Compression Type: snappy
  • Stripe Size: 67,108,864
  • Row index Stride: 10K
  • Bloom Filter Columns: NA
  • False Positive Probability of Bloom Filter Columns: 0.05

what is the max records i go to while writing the files? to get the advantage of minimizing the number of generated files

And the other params used above, are the default per the product.. When should i tune these parameters?

Don't have an account?
Coming from Hortonworks? Activate your account here