Support Questions
Find answers, ask questions, and share your expertise
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Optimizing max records inORC files

New Contributor


I have a process that will push rows to an external table, it shall push around 500M cdr per day.

the external tables is using orc file with the following config:

  • Max Records: 10K
  • Compression Type: snappy
  • Stripe Size: 67,108,864
  • Row index Stride: 10K
  • Bloom Filter Columns: NA
  • False Positive Probability of Bloom Filter Columns: 0.05

what is the max records i go to while writing the files? to get the advantage of minimizing the number of generated files

And the other params used above, are the default per the product.. When should i tune these parameters?