Hello @mokkan,
Yes, that buffer means that it will wait until it reaches the 100kb and will write it to the location under HDFS.
If you need to get the events before, you can decrease the buffer size.
Not all the times the buffer is reached it will perform the fsync,
According to what I found, Spark uses OutputStream.flush(), so not all the times an fsync() will be performed.
Regards,
Andrés Fallas
Cloudera Customer Operations Engineer
Regards,
Andrés Fallas
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs-up button.