Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

kakfa architecture questions

Solved Go to solution
Highlighted

kakfa architecture questions

New Contributor

Hi Guru,

Can you please clarify few Kafka architecture question. Please answer here rather than pointing to links ( which I already did and could not understand)

  1. I just want to understand where Kafka partition structure is created in Kafka, "FIRST"?

i) Was it created in memory or

ii) on disk in log.dirs location

2) do consumers read the partition, that are stored in memory or from disk?

3) some of the links in google search says "Kafka purges the messages as per the retention policies --- Regardless of whether the messages has been consumed". Does this mean that consumer reads the topics from disk only and not from memory?

4) what is the relation among batch.size vs log.flush.interval.messages vs log.segment.bytes ?

4a) https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html links say, Kafka first writes data immediately to files, as soon as Log.flush.interval.messages number of messages got received.

Question is where this file is created, in memory or on disk in which location?

4b) when the log file reaches log.segment.bytes, it flushed the log file to disk.

Question is in first place where this log file is first created in memory or any other temporary location?

Thanks

JJ

1 ACCEPTED SOLUTION

Accepted Solutions

Re: kakfa architecture questions

Super Collaborator
  1. The log.dirs directory is the persistence layer for the Kafka broker (acting as a server process). So when you create a topic (no matter with how many partitions), it is somehow first created in memory, than on the disk, and only after the structure is created on disk, the topic creation is confirmed to the client. Somehow you can consider the memory as the cache for the broker and the disk as the persistent storage.
  2. consumers do not directly access the data stored on the disk. consumers (and producers as well) always communicate with the broker. All the disk I/O to the logs.dir is done by the broker. And consumer do not share memory with the broker or the producer.
  3. no, as I mentioned, no client (consumer or producer) is accessing the broker file system directly. But if the broker purges a message, it will purge it from memory and from disk. Just as mentioned, the memory is a kind of cache to the data stored on disk.
  4. This is a bit more tricky, in almost any Linux, when you create a file or write data to disk, it is first cached and only written to disk when a flush occurs (which also occurs when the file is closed) or a defined number of changes bytes have been chached. So even if an application has written data to disk, it might be physically still in memory as the filesystem does a caching. If your application crashes, all data will still be written to disk, but if your OS crashes, you might loose data that is in the filesystem cache.
2 REPLIES 2

Re: kakfa architecture questions

Super Collaborator
  1. The log.dirs directory is the persistence layer for the Kafka broker (acting as a server process). So when you create a topic (no matter with how many partitions), it is somehow first created in memory, than on the disk, and only after the structure is created on disk, the topic creation is confirmed to the client. Somehow you can consider the memory as the cache for the broker and the disk as the persistent storage.
  2. consumers do not directly access the data stored on the disk. consumers (and producers as well) always communicate with the broker. All the disk I/O to the logs.dir is done by the broker. And consumer do not share memory with the broker or the producer.
  3. no, as I mentioned, no client (consumer or producer) is accessing the broker file system directly. But if the broker purges a message, it will purge it from memory and from disk. Just as mentioned, the memory is a kind of cache to the data stored on disk.
  4. This is a bit more tricky, in almost any Linux, when you create a file or write data to disk, it is first cached and only written to disk when a flush occurs (which also occurs when the file is closed) or a defined number of changes bytes have been chached. So even if an application has written data to disk, it might be physically still in memory as the filesystem does a caching. If your application crashes, all data will still be written to disk, but if your OS crashes, you might loose data that is in the filesystem cache.

Re: kakfa architecture questions

New Contributor

Thank you very much, Harald, for addressing my questions

Regards

JJ

Don't have an account?
Coming from Hortonworks? Activate your account here