Kafka
Kafka Architecture
Kafka’s architecture is easy to understand when broken down into simple building blocks:
- Producer: Sends messages(data) into Kafka.
- Topic: Logical stream where events are categorized.
- Partition: Splits a topic into parallel logs for scalability.
- Broker: Kafka servers that stores records/messages.
- Consumer: Reads records from topics.
- Zookeeper: (Optional in newer versions) coordinates Kafka brokers.
Usage
Create a new topic
kafka-topics --create --topic order-events --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
kafka-topics --list --bootstrap-server localhost:9092
Produce Messages (simulate orders):
kafka-console-producer --topic order-events --bootstrap-server localhost:9092
kafka-console-consumer --topic order-events --bootstrap-server localhost:9092 --from-beginning
Cost-efficient Kafka
- Continuously optimize: Start by eliminating inactive resources such as unused topics, idle consumer groups, and idle connections. These resources consume valuable cluster resources, contribute to CPU, memory, and storage utilization, and increase rebalances. If they’re not needed, eliminate them.
- Shrink your payload: Enable client-level compression and use more efficient data formats like Avro or Protobuf. While Protobuf has a steeper learning curve, once implemented, your CFO will appreciate the savings, and your application will thrive.
- Avoid the default: Continuously fine-tune your brokers to match your current workload by updating their num.network.threads and num.io.threads. There’s no one-size-fits-all configuration, so iterate and experiment. Finding the sweet spot for both will increase your cluster’s throughput and responsiveness without adding more hardware—which would increase your spending.
- Adopt dynamic sizing: Shift from static to dynamic resource allocation. Ensure your Kafka clusters use only the necessary hardware and resources at any given moment.
Reference
https://kafka.apache.org/
https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/kafka.html
https://www.datadoghq.com/knowledge-center/apache-kafka/
https://www.linkedin.com/posts/stanislavkozlovski_kafka-apachekafka-dataengineering-activity-7227972197183598592-JZBc/
https://stackoverflow.blog/2024/09/04/best-practices-for-cost-efficient-kafka-clusters/
https://medium.com/@akmuthumala/introduction-to-apache-kafka-a-hands-on-guide-with-docker-bc65ae1009e5