Queuing in the background – getting started with RabbitMQ message broker
In PHP business logic is usually put right in action’s method or just behind it. Hence, every little piece of delaying and long-running code will be processed with a request. The problem is almost undetectable if a user sends an e-mail but with more complex actions it may take a little bit longer than preferred.
An example with handling e-mails is the first to come to mind. But let’s imagine that an application needs to compress an image or to compress batch import images. It will take a way too much time until the page will return any response. Probably, most developers will create a simple “queue” table and use SELECT * FROM queue WHERE done = 0 query. Unfortunately, databases are not designed to handle huge traffic so this solution might work only in case of tiny transfer rate, where – to be honest – using queues is unnecessary. In any other situation it will only cause deadlock or stuck and lead to further problems related to traffic growth. Of course, it is possible to write a custom solution to queue the import bulk and render images one by one and call it in the background right after the request, but it is unreasonable effort considering availability of ready made products.
Besides, would custom made solutions be prepared for scalability, cloud systems or speed performance out of the box? I believe they might be, but certainly not in a couple of hours – after this time off the shelf solutions generally will be inefficient and poorly prepared for development and reusability in the future. Thus, we can claim that this method is not a perfect problem fixer, so we should find another way to cope with the queuing.
As you presume, there is an effective solution that mitigates the impact of above mentioned problem – usage of a ready made message broker. Basically, the message broker receives the message with a new database record with every request but it will not cause a deadlock, because the server asks the message broker for data only when has resources and capacity.
In this article I would like to make an attempt to present a solution to the very annoying everyday problem that probably many programmers came across in their organisations – deadlocks in databases caused by a vast number of requests in relatively short time. The main aim of this text is to introduce RabbitMQ, which I value as a very functional and practical message broker, to help you solve the queuing problems and decrease the amount of work you would otherwise have to spend on it.
Why do we need a message broker?
Queuing brokers are useful for several reasons. Firstly, a queue between an application and the web service will reduce coupling, because both need queue’s configuration parameters and its name. It is more likely and convenient to move one system to different server or environment than to move whole queue management system. Secondly, queues work as a buffer – if the main application receives a lot of requests coming in in a short time it certainly will get stuck. The accounting system is able to process them all anyway, but using brokers prevents deadlocks – the broker will handle all requests which cannot be processed on time by the main application. Moreover, queuing is a smart bridge between applications written in different technologies and languages. A software does not need to know anything about receiver, it only sends a message to broker which handles the reports and distributes them to external programs.
How to choose the perfect broker?
Likewise choosing a framework or even a programming language, a few tools purposed for queuing tasks in the background are available. Similarly, according to the t-shape principle (suggesting to gain a wide spectrum of skills with perfectly mastered one) you should have basic knowledge about most of the queuing tools but dig into only one. Each one is different and handles various problems and it is really important to be aware of pros and cons of every alternative.
Because of this great number of choices, to narrow users’ options there is a website http://queues.io/ that collects and compares queues libraries, provides short descriptions of advantages and weaknesses and as a result helps to select the right tool.
I advise to make the choice between one of these: RabbitMQ (http://www.rabbitmq.com), Gearman (http://gearman.org/), ActiveMQ (http://activemq.apache.org/), Apollo (http://activemq.apache.org/apollo/), laravel (http://laravel.com), Apache Kafka (http://kafka.apache.org/) or HornetMQ (http://www.jboss.org/hornetq).
All of above mentioned brokers have their unique strengths, but personally, for facing lengthy calculations or when a page should be received by an end-user immediately – for instance while sending emails, saving frequent and sensitive data to database, recalculating statistics or logging events I strongly recommend RabbitMQ. It is wicked fast and reliable, comes with high security and transactions, persistence and flow control – all in one tool! RabbitMQ brokers run on many platforms and offer several clients (Java, .NET among others), all which use AMQP protocol to speak to each other. Anyone can write a RabbitMQ client in any language: Ruby, Java, Spring, Python, C, C#, Perl, Erlang, Lisp, Haskell. Furthermore, RabbitMQ is capable of working with three most popular protocols and this feature makes it quite elastic, which results in fact that it is possible to connect it with various ready-made applications. Additionally, in contrast to remaining brokers RabbitMQ is based on Erlang. This makes it less prone to programmer’s mistakes, gives it concurrency support and embedded database. Moreover, you will be able to update RabbitMQ without restarting.
Protocols
It is quite obvious that there are no perfect solutions for every problem and your decision can be based on performance, ease of learning or other indicators. Besides of implementation, it is really worth going deeper into the details of each protocol and get to know it better.
We should take into consideration three most popular protocols: AMQP, MQTT and STOMP. Mostly, those protocols were invented by commercial companies to handle specific problems. But after successful projects the decision was made to share them as open-source systems. Due to diverse histories and environments they are designed to work with most frequent problems in different domains, so it is up to you which solution will be applied to your project. I prefer to work with AMQP, because it is the most flexible and advanced protocol delivering data to hundreds of clients in real time across the global network, as well prepared to work with web applications.
AMQP
AMQP, which stands for Advanced Message Queuing Protocol, has been developed as an open standard for passing messages between applications or organisations. The most important reasons to use this protocol are reliability and interoperability – it enables the development team to work with different languages, platforms, time and space. As an advanced system it provides a wide range of features related to messaging, queuing, flexible routing, transactions, security and also a fine-grained control is possible.
Referring to the official website http://amqp.org – a standard has no use without any products, and there is a wide range of excellent AMQP contributors: NASA, Google, Mozilla, AT&T etc.
MQTT
Message Queue Telemetry Transport protocol was designed as an extremely lightweight publish/subscribe messaging transport protocol. It endeavours to be as fast as possible and works with a small code footprint in limited network bandwidth. The home page (http://mqtt.org) informs it has been used in sensors communicating to a broker via satellite link. Nowadays, our cell phones have a greater computational power than satellites so we can define mobile devices as a target for this protocol.
MQTT’s strength is simplicity. It contains no sophisticated options nor configurations, no message properties and only five API methods and it perfectly fits to frequently updated data.
STOMP
Simple (or Streaming) Text Oriented Messaging Protocol in contrast to above mentioned protocols is the only one text-based, making it more similar to HTTP in terms of how it looks under the cover. It was designed to be lightweight and widely-interoperable and is so simple that the authors encourage users to write their own clients if they are not available in selected languages.
Yet most popular web technologies are already covered by at least one or two libraries prepared for STOMP. For PHP we have stomp-php or Zend_Queue. You can find the list on the official website (http://stomp.github.io/implementations.html).
Of course, while working on different projects an experienced developer can choose between protocols and their implementations and change his/hers choices depending on actual needs. But the question which I would like to answer in this article is which queuing tool and protocol are the best to start the adventure with?
The Internet is full of statistics, surveys, rankings and comparisons of available products, but it is worth to point out that most queuing systems are based only on one protocol and exclusively one cooperates with all three – RabbitMQ.
RabbitMQ in practice
In essence, RabbitMQ accepts messages from producers, and delivers them to consumers. In-between, it can route, buffer, and persist the messages according to rules the developer introduces.
When working with RabbitMQ the programmer shall become accustomed to its special jargon. RabbitMQ’s authors have created an unique dictionary:
- ‘producer’ is a program which sends messages to other applications;
- ‘producing’ is similar to sending; it is an action called as the producer prepares a message for external services and then queues it for the consumers;
- ‘consumer’ it is nothing more but a program that receives a message;
- ‘consuming’ means receiving message by a consumer;
- ‘a queue’ is a name of internal RabbitMQ’s storage from where all messages are distributed to the consumer.
To clarify: a producer is producing messages and sending them to queues only to be later consumed by the consumers.
Real life experience
I faced the problem of queuing in one of the projects I have been recently working on and therefore I would like to use it as an example of RabbitMQ’s great usefulness.
Imagine developing an advertising system which shares content with customer’s services and produces over a million requests per hour. To guarantee continuous work of the system, statistics should be pushed to database and calculated immediately.
As we admitted in the beginning, database is not the best service to handle frequent writes. This drawback combined with multiple reads and intensive traffic may lead to a serious problem, because when new data is pushed into the database, during every refreshing the end user will have to wait longer and longer for content to be displayed.
But is it really necessary to write an information about every new action during the request and tell the user to wait for proper content? Definitely not! I suggest using AMQP protocol to queue most time consuming procedures.
To focus only on important issues for the example I chose Symfony 2.3.7 as an engine, without any external bundles (modules) connecting php-amqplib with default repositories, so it is possible to use it in every other PHP framework or just catch the idea and implement in other technology.
Server installation
RabbitMQ has been a part of default repository for many of the Linux distributions for a long time – for instance in Ubuntu since 9.04 and Debian since 6.0. For Windows only manual installation is available. Example in this text will be based on the Ubuntu environment with apt repository manager, because it is the most popular for servers deployment (as well as CentOS).
To install the library we need to open the console and call the command:
1 | sudo apt-get install rabbitmq-server |
After successfully proceeding through the installation a message should appear on command line:
1 | * Starting message broker rabbitmq-server |
Also, RabbitMQ’s specification informs users that system’s limit on open file handlers should be set to at least 1200. To adjust that the option ulimit in /etc/default/rabbitmq-server should be edited and uncommented.
Hello world in AMQP
To begin with implementation of AMQP communication in RabbitMQ we can use a library recommended by RabbitMQ’s authors: php-amqplib (https://github.com/videlalvaro/php-amqplib). It is not designed exclusively for RabbitMQ but it is dedicated for AMQP protocol in general, so the package is also a good tool for other implementations.
To install php-amqplib in an application the user is advised to use the composer – it is one of the most popular package manager system designed to handle relations between external modules (http://getcomposer.org).
Installation is really simple and with composer we are able to continue the process using following console command:
1 | composer.phar require videlalvaro/php-amqplib v2.2.2 |
After downloading repositories and creating autoloader file system creates a new directory with libraries and php-amqplib will be accessible for other parts of the system.
Content view action
At this stage, the system is expected to return a response with previously prepared advertisement (e.g. banner, image, flash content or simple text) and write an information in database about what, when and to whom it was delivered.
Let’s start with a piece of code which delivers an image banner for user and saves information about it in database (Listing 1).
Listing 1. Main controller
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | class AdvertisementController extends Controller { /** * @Route("/get", name="ad") */ public function getAction(Request $request) { $criteria = $this->get('advertisement.criteria')->resolveRequest($request); $advertisement = $this->get('advertisement.manager')->getByCriteria($criteria); $requestInformation = $this->get('advertisement.request.manager')->getGeneralInformation($request); //There the problem with performance may occur $this->getDoctrine()->getRepository('XsolveAdvertisementBundle:Advertisement')->saveStatistics($requestInformation, $advertisement); return new Response( $advertisement->getImage(), 200); } } |
Example above will cooperate with MySQL database only up to 300 request per minute and about 1200 with PostgreSQL. It is not a lot and above this limit tables will fall into a deadlock and crash the system. Most often, to prevent this kind of situation programmers create temporary tables using different engines (MyISAM or MEMMORY) and move rows in bulk to the destination table. In effect, the limit will only be slightly increased as well as this solution would not be scalable without changing the server’s structure and architecture, and after a few months the team will face the same problem again, but the solution will be harder to find.
Producer
But what can we do if we want to scale horizontally our application only by adding a new machine into the environment? We will need external system to distribute tasks and messages – RabbitMQ. Let’s try to connect it and treat our getAction as a Producer and move saveStatistics to another controller as a Consumer (Listings 2-5).
Listing 2. Refactored main controller
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | class AdvertisementController extends Controller { /** * @Route("/get", name="ad") */ public function getAction(Request $request) { $criteria = $this->get('advertisement.criteria')->resolveRequest($request); $advertisement = $this->get('advertisement.manager')->getByCriteria($criteria); $requestInformation = $this->get('advertisement.request.manager')->getGeneralInformation($request); $this->get('advertisement.producer')->produce($requestInformation, $advertisement); return new Response($advertisement->getImage(), 200); } } |
Listing 3. Advertisement.producer service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | use PhpAmqpLib\Connection\AMQPConnection; use PhpAmqpLib\Message\AMQPMessage; class StatisticsProducer implements Producer { /** * @var AMQPConnection */ protected $AMQPConnection; /** * @var StatisticMessageResolver */ protected $messageResolver; public function __construct(array $configuration, StatisticMessageResolver $messageResolver) { $this->AMQPConnection = $this->createConnection($configuration); $this->messageResolver = $messageResolver; } public function produce(RequestInformation $requestInformation, Advertisement $advertisement) { $AMQPMessage = $this->messageResolver->createMessage($requestInformation, $advertisement); $this->send($AMQPMessage); } protected function send(AMQPMessage $AMQPMessage) { $channel = $this->getChannel(); $channel->queue_declare('ad_statistics', false, false, false, false); $channel->basic_publish($AMQPMessage); } protected function getChannel() { return $this->AMQPConnection->channel(); } protected function createConnection(array $configuration) { return new AMQPConnection( $configuration['host'], $configuration['host'], $configuration['login'], $configuration['password']); } } |
Listing 4. AMQP message resolver
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | use PhpAmqpLib\Message\AMQPMessage; class StatisticMessageResolver implements MessageResolver { public function createMessage(RequestInformation $requestInformation, Advertisement $advertisement) { $adStatistic = new AdvertisementStatistic($advertisement); $adStatistic->addRequest($requestInformation); return new AMQPMessage(serialize($adStatistic)); } } |
In listings from 2 to 4 we created an easy example of message producer by changing only one line of code in controller. Going through the code step by step we create both the connection to RabbitMQ server and default channel where we are going to put our prepared messages (AMQPMessage object). Listing 5 presents the most important steps in one view.
Listing 5. General steps
1 2 3 4 5 6 7 8 9 10 11 12 13 | $adStatistic = new AdvertisementStatistic($advertisement); $adStatistic->addRequest($requestInformation); $AMQPMessage = new AMQPMessage(serialize($adStatistic)); $AMQPConnection = new AMQPConnection( $configuration['host'], $configuration['host'], $configuration['login'], $configuration['password']); $channel = $this->AMQPConnection->channel(); $channel->queue_declare('ad_statistics', false, false, false, false); $channel->basic_publish($AMQPMessage); |
As you can see, the AMQPMessage class passes variable ‘body’, which is a string, and whenever any kind of object has to be transferred as a ‘body’ it should be fully serializable.
Consumer
The consumer is even simpler than the producer. In our case we can declare consumer inside the same application or create a new one with installed common statistics bundle (module) where we have defined all our classes. Of course, we could send it to Ruby or Python lightweight script as an array if we wanted.
Listing 6. Consumer class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | use PhpAmqpLib\Connection\AMQPConnection; use PhpAmqpLib\Message\AMQPMessage; class StatisticsConsumer implements Consumer { /** (...) **/ /** * @var Repository */ protected $statisticsRepository; public function __construct(array $configuration, StatisticMessageResolver $messageResolver, Repository $statisticReporitory) { $this->AMQPConnection = $this->createConnection($configuration); $this->messageResolver = $messageResolver; $this->statisticsRepository = $statisticsRepository; } public function consume() { $channel = $this->getChannel(); $channel->queue_declare('ad_statistics', false, false, false, false); $callback = array($this, 'consumeMessage'); $channel->basic_consume('', '', false, true, false, false, $callback); while(count($channel->callbacks)) { $channel->wait(); } } public function consumeMessage($message) { $advertisement = unserialize($message); $this->statisticsRepository->save($advertisement); } } |
In listing 6 the service is only listening the server and waits for response. Every new request is provided to callback function and then saved to the database. Please note that a queue is declared here as well. Whenever we receive a message, it will be passed to the $callback function. Because the receiver may be started before the sender, a programmer has to make sure the queue exists before the consumer tries to get messages from it. The code will block while receives callbacks.
Conclusion
I would like you to notice that we can copy Consumer application and create totally independent new instances which may improve the performance – in our case we achieved an average increase of 1200% on content delivery time and the server did not crash with the same traffic. That should be considered as a great success.
To conclude, RabbitMQ is a powerful tool, good for message broker beginners and I strongly advise you to try how it works. Thanks to RabbitMQ a quick implementation of scalable solutions is as simple as never before. Changes in the application’s bottlenecks can take up only several hours and at the same time increase the performance. And it is only for starters, because RabbitMQ is not even the fastest broker on the market.
This article was published in SD Journal magazine