A Python RabbitMQ producer
In this article, you will learn how to publish data to RabbitMQ using a Python Producer, illustrated through a practical example. I’ve created a GitHub repository containing a CLI application with the code and files we’ll discuss.
The first step in almost every data project is data ingestion. Regardless of the data’s location or format, you need to find a way to transform it into an incoming data flow for your project. This allows you to clean, process, and ultimately generate insights or build machine-learning models from it.
In this project, our goal is to build and connect the necessary components for a data ingestion setup using RabbitMQ and Python. For simplicity, we will use a .csv
file as the data source and simulate a continuous incoming data flow. Think of it as something similar to a temperature sensor or stock prices during open hours.
Configure and Start the RabbitMQ node
The first step is to start a Broker node (a Broker is a server with a RabbitMQ instance running on it). For this, we will use a docker-compose.yaml
file to set up the Broker service:
version: '3'
services:
rabbitmq:
build:
context: ./rabbitmq
container_name: rabbitmq_broker
ports:
- 5672:5672
- 15672:15672
volumes:
- rabbitmq_data:/var/lib/rabbitmq
As you can see, the Docker context for building the RabbitMQ image points to a directory called rabbitmq
. I chose to do this because RabbitMQ requires configuration files to set up Queues, Topics, and Exchanges. The directory I created looks like this:
Let’s start by reviewing the Dockerfile file.
FROM rabbitmq:3.12-management
COPY config/20-defaults.conf /etc/rabbitmq/conf.d
COPY config/definitions.json /etc/rabbitmq
This Dockerfile
copies the configuration files to a specific location within the image. This action makes these files available for pre-configuring the Broker node. You can learn more about this here and here.
Now, let’s review the 20-defaults.conf
file:
default_user = admin
default_pass = admin
# when set to true, definition import will only happen
# if definition file contents change
definitions.skip_if_unchanged = true
definitions.import_backend = local_filesystem
definitions.local.path = /etc/rabbitmq/definitions.json
The first thing to note is the filename. By naming the file this way, we ensure that the entire default configuration for a Broker node is loaded first, as stated in the documentation:
“They will be loaded in alphabetical order. A common naming practice uses numerical prefixes in filenames to make it easier to reason about the order, or make sure a “defaults file” is always loaded first, regardless of how many extra files are generated at deployment time”
ls -lh /path/to/a/custom/location/rabbitmq/conf.d
# => -r--r--r-- 1 rabbitmq rabbitmq 87B Mar 21 19:50 00-defaults.conf
# => -r--r--r-- 1 rabbitmq rabbitmq 4.6K Mar 21 19:52 10-main.conf
# => -r--r--r-- 1 rabbitmq rabbitmq 1.6K Mar 21 19:52 20-tls.conf
# => -r--r--r-- 1 rabbitmq rabbitmq 1.6K Mar 21 19:52 30-federation.conf
For more details, see here.
Now, let's review its content. The first two lines configure the credentials to be used when sending messages to the Broker. These credentials also allow you to access the RabbitMQ management plugin, which we’ll use to check that the broker received our messages (more information here). The final section tells the Broker node where to look for definitions.
Definitions
The definitions file allows us to configure the Broker node and can be imported at node startup or using RabbitMQ CLI tools:
“Nodes and clusters store information that can be thought of schema, metadata or topology. Users, vhosts, queues, exchanges, bindings, runtime parameters all fall into this category. This metadata is called definitions in RabbitMQ parlance.
Definitions can be exported to a file and then imported into another cluster or used for schema backup or data seeding.
Definitions are stored in an internal database and replicated across all cluster nodes. Every node in a cluster has its own replica of all definitions. When a part of definitions changes, the update is performed on all nodes in a single transaction. This means that in practice, definitions can be exported from any cluster node with the same result. VMware Tanzu RabbitMQ supports Warm Standby Replication to a remote cluster, which makes it easy to run a warm standby cluster for disaster recovery”. (for more information about this here: https://www.rabbitmq.com/docs/management)
Let’s review our definitions file:
{
"bindings": [
{
"source":"temperature",
"vhost":"/",
"destination":"room_1",
"destination_type":"queue",
"routing_key":"temperature.room_1",
"arguments":{}
}
],
"exchanges": [
{
"name": "temperature",
"vhost": "/",
"type": "topic",
"durable": true,
"auto_delete": false,
"internal": false,
"arguments": {}
}
],
"global_parameters": [],
"parameters": [],
"permissions": [
{
"configure": ".*",
"read": ".*",
"user": "admin",
"vhost": "/",
"write": ".*"
}
],
"policies": [],
"queues": [
{
"name":"room_1",
"vhost":"/",
"durable":true,
"auto_delete":true,
"arguments":{}
}
],
"rabbit_version": "3.12.2",
"rabbitmq_version": "3.12.2",
"topic_permissions": [],
"users": [
{
"hashing_algorithm": "rabbit_password_hashing_sha256",
"limits": {},
"name": "admin",
"password_hash": "Xz1Cbqa1+QGyNGohlpT24SDCwxZVZoVIeq50De6IZRT/Ki5M",
"tags": [
"administrator"
]
}
],
"vhosts": [
{
"limits": [],
"metadata": {
"description": "Default virtual host",
"tags": []
},
"name": "/"
}
]
}
This Exchange will receive all the data we publish. Additionally, we are defining a queue named “room_1,” which we will bind to the “temperature” Exchange using a “routing key.” This means that all the messages published to the “temperature” Exchange with the routing key “temperature.room_1” will be delivered to our “room_1” queue. Finally, we grant the “admin” user full permissions.
Finally, to start the broker using our Docker compose using our docker-compose.yaml
file just run:
docker compose up rabbitmq
Now, let’s check that everything is working with our node. To do this, we will use the RabbitMQ management plugin, which is already running in the background thanks to the Docker image we used. Open a new browser tab and go to http://localhost:15672/#/
then log in to the management plugin using the credentials you configured in 20-defaults.conf
.
If everything is OK you should see something like this:
Now that we have finished configuring the broker, we can move on to publishing messages to it.
RabbitMQ Producer
Now let’s send some data to our Broker node, specifically to our Queue “room_1”. To do this, I have created a CLI Python application that reads data from a file and continuously sends it to our Queue by using a RabbitMQ Producer, let’s see how to do it.
python -m producers --help
Let’s inspect the RabbitMQ command:
python -m producers rabbit-mq --help
To start a RabbitMQ Producer that publishes messages directly to the “room_1” Queue we created, open a new terminal in the project’s root directory and run:
python -m producers rabbit-mq --host=localhost --port=5672 --exchange=temperature --topic=temperature.room_1 --vhost=/ --user=admin --password=admin --file-path=./data/room_1/temperature.csv
Once the Producer is running, you will be able to see and consume these messages. Do you remember the Management plugin we used earlier to check our Queues? Let's go to that view, and you should see the number of messages in the "room_1" queue start to increase.
Finally, you can manually consume these messages to confirm that our Producer is working. To do this, inspect the “room_1” queue by clicking on it and scrolling down to the ‘Get messages’ section. Once there, let’s get the first two messages we sent.
And that’s it, you got it!
But what’s happening under the hood? let’s dive a little bit deeper into our Python RabbitMQ Producer.
First, as I mentioned before, we are using a “.csv” file that contains fake data to simulate a temperature sensor instead of generating fake data on the fly. I opted for this approach to take advantage of the situation and talk about a very common problem. For example, Sometimes, you may have large “.csv” files, and attempting to load them entirely into memory for could lead to application crashes due to out-of-memory errors. To mitigate this issue, I am using Pandas to load small “chunks” of data at a time, rather than loading the entire file. Also, pay attention to the return type of this function, it is a generator (“In Python, a generator is a function that returns an iterator that produces a sequence of values when iterated over. Generators are useful when we want to produce a large sequence of values, but we don’t want to store all of them in memory at once.” — https://www.programiz.com/python-programming/generator)
class Reader(IReader[Dict[str, Any]]):
"""Reader class."""
def __init__(self, path: str) -> None:
"""Class constructor."""
self._path = pathlib.Path(path)
def read(self) -> Generator[Dict[str, Any], None, None]:
"""Read data from a .csv file and return it as generator.
Yields:
Generator[Dict[str, int], None, None]: data generator.
"""
# Suppose the file that we are going to read could be huge and load it
# completely in memory could crash your running process.
# To avoid this, we are going to fetch fixed size chunks of data.
for chunk in pd.read_csv(self._path, chunksize=10):
for measurement in chunk.itertuples(name="Measurement"):
time.sleep(5) # simulates a sample rate of 5 second for a sensor
yield {
"temperature": measurement.temperature,
"timestamp": measurement.timestamp
}
Secondly, the serializer is responsible for formatting the message before sending it to the Broker
class Serializer(ISerializer[Dict[str, Any], bytes]):
"""Serializer class."""
def serialize(self, data: Dict[str, Any]) -> bytes:
"""Serialize a json object to bytes."""
return json.dumps(data).encode('utf-8')
Finally, we are using our “__main__.py” file to build our CLI commands. You can think of them as little orchestrators that read the data prepare a message and it to the producer.
@app.command()
def rabbit_mq(
host: Annotated[
str, typer.Option(help="Broker server host.")
],
port: Annotated[
int, typer.Option(help="Broker server port.")
],
exchange: Annotated[
str, typer.Option(help="Exchange to use.")
],
topic: Annotated[
str, typer.Option(help="Topic to publish message to.")
],
vhost: Annotated[
str, typer.Option(help="Virtual host.")
],
user: Annotated[str, typer.Option(
help="User."
)],
password: Annotated[str, typer.Option(
help="Password."
)],
file_path: Annotated[
str,
typer.Option(
help="Path to the '.csv' file that contains the sample data."
),
] = ".",
) -> None:
"""Plusblish messages to RabbitMQ broker."""
reader = Reader(path=file_path)
bytes_seralizar = serializers.bytes.Serializer()
logger.info("Starting the RabbitMQ producer...")
producer = clients.rabbitmq.Producer(
host=host,
port=port,
user=user,
password=password,
vhost=vhost,
delivery_mode="Transient",
content_type="application/json",
content_encoding="utf-8",
)
# Start reading the data as a stream from a file.
with producer as publisher:
for data in reader.read():
message = clients.rabbitmq.RabbitMQMessage(
topic=topic,
exchange=exchange,
body=bytes_seralizar.serialize(data=data)
)
publisher.publish(message=message)
As you can see, I have created the Producer to be used as a context manager to control the connection with the broker. The following is the Producer implementation.
from __future__ import annotations
from dataclasses import dataclass
import logging
import pika
from pika.adapters.blocking_connection import BlockingChannel
from ...application.ports import IProducer, IMessage
logger = logging.getLogger(__name__)
@dataclass
class RabbitMQMessage(IMessage):
"""RabbitMQMessage class."""
exchange: str
topic: str
body: bytes
class Producer(IProducer[IMessage]):
"""RabbitMQ Producer class."""
_user: str
_password: str
_host: str
_port: str
_vhost: str
_delivery_mode: str
_content_type: str
_content_encoding: str
_channel: BlockingChannel
def __init__(
self,
user: str,
password: str,
host: str,
port: str,
vhost: str,
delivery_mode: str = "Persistant",
content_type: str = "application/json",
content_encoding: str = "utf-8"
) -> None:
"""Class constructor."""
self._user = user
self._password = password
self._host = host
self._port = port
self._vhost = vhost
self._connection = None
self._delivery_mode = delivery_mode
self._content_type = content_type
self._content_encoding = content_encoding
def _connect(self) -> None:
"""Connect to RabbitMQ."""
credentials = pika.PlainCredentials(self._user, self._password)
parameters = pika.ConnectionParameters(
host=self._host, port=self._port, virtual_host=self._vhost, credentials=credentials
)
self._connection = pika.BlockingConnection(parameters=parameters)
def __enter__(self) -> Producer:
"""Create and start a connection with rabbitmq using the context manager interface."""
self._connect()
self._channel = self._connection.channel()
return self
def __exit__(self, exc_type, exc_value, exc_tb):
"""Stop the connection when the context manager is exited."""
# Gracefully close the connection
self._connection.close()
def publish(self, message: RabbitMQMessage) -> None:
"""
Publish a message to RabbitMQ.
Args:
message (RabbitMQMessage): message to publish
Returns
None
"""
properties = pika.BasicProperties(
delivery_mode=pika.DeliveryMode[self._delivery_mode],
content_type=self._content_type,
content_encoding=self._content_encoding
)
self._channel.basic_publish(
exchange=message.exchange,
routing_key=message.topic,
body=message.body,
properties=properties
)
logger.info("Message sent: '%s'", message.body)
We are creating a connection and a dedicated channel to send these messages. When we go out of the context manager the connection with the Broker is automatically closed (methods “__enter__” and “__exit__”).