{getToc} $title={Table of Contents}
Kafka vs. RabbitMQ |
Introduction
What is Kafka?
Apache Kafka is a
distributed event store
and
stream-processing
platform. It is an
open-source
system developed by the
Apache Software Foundation
written in
Java
and
Scala. The project aims to provide a unified, high-throughput, low-latency
platform for handling real-time data feeds. Kafka can connect to external
systems (for data import/export) via Kafka Connect, and provides the Kafka
Streams
libraries
for stream processing applications.
Kafka uses a binary
TCP-based protocol that is optimized for efficiency and relies on a “message
set” abstraction that naturally groups messages together to reduce the
overhead of the network roundtrip.
This “leads to larger network packets, larger sequential disk operations,
contiguous memory blocks […] which allows Kafka to turn a bursty stream of
random message writes into linear writes.”
What is RabbitMQ ?
RabbitMQ is an open-source
message-broker
software (sometimes called
message-oriented middleware) that originally implemented the
Advanced Message Queuing Protocol (AMQP)
and has since been extended with a
plug-in architecture
to support
Streaming Text Oriented Messaging Protocol (STOMP), MQ Telemetry Transport (MQTT), and other protocols.[1]
Written in
Erlang, the RabbitMQ server is built on the
Open Telecom Platform
framework for clustering and failover. Client libraries to interface with
the broker are available for all major programming languages. The source
code is released under the
Mozilla Public License.
Body
Why we need a Messaging System?
Data is transferred between apps through a messaging system, freeing the
programs to concentrate on the data themselves without being distracted by
the exchange and transmission of data. Reliable message queuing serves as
the foundation for distributed messaging. Client applications and the
messaging system queue messages in an asynchronous fashion.The majority of
the time, we use REST-based microservices in our implementations. When we
implement a REST-based solution, the design or standard states that it is
asynchronous. Asynchronous indicates that we do not wait for the next event
to occur. The HTTP protocol, which is synchronous, is the one we utilize for
microservices the majority of the time. As long as we continue to use REST
over HTTP while constructing microservices, this becomes synchronous. So,
the messaging is your biggest and greatest option to prevent that.
What is Synchronous?
Synchronous Transmission: In Synchronous Transmission, data is sent in form of blocks or frames.
This transmission is the full-duplex type. Between sender and receiver,
synchronization is compulsory. In Synchronous transmission, There is no gap
present between data. It is more efficient and more reliable than
asynchronous transmission to transfer a large amount of data.
Synchronous data transmission |
What is Asynchronous?
Asynchronous Transmission: In Asynchronous Transmission, data is sent in form of byte or character.
This transmission is the half-duplex type transmission. In this transmission
start bits and stop bits are added with data. It does not require
synchronization.
Asynchronous serial data transmission |
RabbitMQ and Kafka Performance
Apache Kafka:
Kafka offers much higher performance than message brokers like RabbitMQ. It
uses sequential disk I/O to boost performance, making it a suitable option
for implementing queues. It can achieve high throughput (millions of
messages per second) with limited resources, a necessity for big data use
cases.
RabbitMQ:
RabbitMQ can also process a million messages per second but requires more
resources (around 30 nodes). You can use RabbitMQ for many of the same use
cases as Kafka, but you’ll need to combine it with other tools like Apache
Cassandra.
Kafka Architecture
Kafka Architecture |
Producers and consumers are the same here
— applications that publish and read event messages, respectively. As we’ve
covered when we discussed
Kafka use cases, an event is a message with data describing the event, such as a new user
signing up to a mobile application. Events are queued in Kafka topics, and
multiple consumers can subscribe to the same topic. Topic are further
divided into partitions, which split data across brokers to improve
performance.
Important features of Kafka include:
-
High volume publish-subscribe messages and streams platform
— durable, fast, and scalable.
-
Durable message store
— Kafka behaves like a log, run in a server cluster, which keeps streams of records in topics (categories).
-
Messages are made up of a value, a key, and a timestamp.
-
Dumb broker / smart consumer model
— does not try to track which messages are read by consumers and only keeps unread messages. Kafka keeps all messages for a set period of time.
-
Managed by external services
— in many cases this will be Apache Zookeeper.
RabbitMQ Architecture
RabbitMQ Architecture |
As with other message brokers, RabbitMQ receives messages from applications
that publish them — known as producers or publishers. Within the system,
messages are received at an exchanges — a virtual ‘post-office’ of sorts,
which routes messages onwards to storage buffers known as queues.
Applications that read messages, known as consumers, can subscribe to these
queues to pick up the latest data that arrives in the ‘mailboxes’.
The key features of RabbitMQ are:
-
General purpose message broker — uses variations of request/reply, point
to point, and pub-sub communication patterns.
-
Smart broker / dumb consumer model — consistent delivery of messages to
consumers, at around the same speed as the broker monitors the consumer
state.
-
Mature platform — well supported, available for Java, client libraries,
.NET, Ruby, node.js. Offers dozens of plugins.
- Communication — can be synchronous or asynchronous.
-
Deployment scenarios — provides distributed deployment scenarios.
- Multi-node cluster to cluster federation — does not rely on external services, however, specific cluster formation plugins can use DNS, APIs, Consul, etc.
Pull vs Push Approach
One important difference between Kafka and RabbitMQ is that the first is
pull-based, while the other is push-based. In pull-based systems, the
brokers waits for the consumer to ask for data (‘pull’); if a consumer is
late, it can catch up later. With push-based systems, messages are
immediately pushed to any subscribed consumer. This can cause these two
tools to behave differently in some circumstances.
Apache Kafka: Pull-based approach
Kafka uses a pull model. Consumers request batches of messages from a
specific offset. Kafka permits long-pooling, which prevents tight loops when
there is no message past the offset, and aggressively batches messages to
support this
A pull model is logical for Kafka because of partitioned data structure.
Kafka provides message order in a partition with no contending consumers.
This allows users to leverage the batching of messages for effective message
delivery and higher throughput.
RabbitMQ: Push-based approach
RabbitMQ uses a push model and stops overwhelming consumers through a
prefetch limit defined on the consumer. This can be used for low latency
messaging..
The aim of the push model is to distribute messages individually and
quickly, to ensure that work is parallelized evenly and that messages are
processed approximately in the order in which they arrived in the queue.
However, this can also cause issues in cases where one or more consumers
have ‘died’ and are no longer receiving messages.
How Do They Handle Messaging?
Kafka Use Cases
Some of the best Kafka use cases make use of the platform’s high throughput
and stream processing capabilities.
High-throughput activity tracking: Kafka can be used for a variety of
high-volume, high-throughput activity-tracking applications. For example,
you can use Kafka to track website activity (its original use case), ingest
data from IoT sensors, monitor patients in hospital settings, or keep tabs
on shipments.
Stream processing: Kafka enables you to implement application logic
based on
streams
of events. You might keep a running count of types of events or calculate an
average value over the course of an event that lasts several minutes. For
example, if you have an IoT application that incorporates automated
thermometers, you could keep track of the average temperature over time and
trigger alerts if readings deviate from a target temperature range.
Event sourcing: Kafka can be used to support event sourcing, in which
changes to an app state are stored as a sequence of events. So, for example,
you might use Kafka with a banking app. If the account balance is somehow
corrupted, you can recalculate the balance based on the stored history of
transactions.
Log aggregation: Similar to event sourcing, you can use Kafka to
collect log files and store them in a centralized place. These stored log
files can then provide a single source of truth for your app.
RabbitMQ Use Cases
Some of the best RabbitMQ
use cases
make use of its flexibility — both for routing messages within microservices
architectures and among legacy apps.
Complex routing: RabbitMQ can be the best fit when you need to route
messages among multiple consuming apps, such as in a microservices
architecture. RabbitMQ consistent hash exchange can be used to balance load
processing across a distributed monitoring service, for example. Alternate
exchanges can also be used to route a portion of events to specific services
for A/B testing.
Legacy applications: Using available plug-ins (or developing your
own), you can deploy RabbitMQ as a way to connect consumer apps with legacy
apps. For example, you can use a Java Message Service (JMS) plug-in and JMS
client library to communicate with JMS apps.
Conclusion
Hope this article was helpful.
{getCard} $type={post}