Benchmarking RPC Protocols
The Infobip platform isn’t one large monolith. In a typical example of microservice architecture it consists of a multitude of small services which represent different parts of the platform. Each service has a single, well-defined purpose and is only concerned with how to do its job in the best possible way.
Having a bunch of small services makes it easier to manage the entire platform. Any single service can be updated at any time without disrupting others. Services can be (and are) written in different programming languages that best fit the problem that they are trying to solve. Parts of platform that handle large amount of traffic can be scaled independently of those that are under the light load. Those are all advantages of a microservice architecture, which make developing the Infobip platform a great experience.
But no service lives in a vacuum. Being responsible for only one small part means that services that work together need some way to communicate. For example, launching an SMS campaign in our Infobip Portal will be handled by the service in charge of the campaigns. If a campaign includes customer’s contacts or groups, they will be retrieved from a different service that is managing the contacts. And there is yet another service for scheduling messages and others to route, bill and deliver messages to mobile subscribers.
Our services communicate with each other via different Remote Procedure Call (RPC) protocols. There were bunch of them that were tried during the years, but only three are still in use at this moment.
- Good old HTTP/1.1. There are multiple client and server implementations in pretty much any programming language which makes it a good fit for our polyglot platform. We usually send plain JSON over this protocol which makes it easier to debug services since we only need a web browser or a REST client
- Our second protocol is again HTTP, but this time sending Java serialized objects back and forth. This was the default when two services written in Java programming language were communicating since they can natively read and write this kind of data
- MML, short for Machine-to-Machine-Language, is our in-house RPC protocol designed for large throughput. Is has similarities to HTTP/2, which didn’t yet exist when we started working on MML. It uses one persistent TCP connection between the client and server and allows clients to load balance between servers and perform failover when necessary. MML uses typed JSON as payload – it looks like regular JSON but each object is tagged with Java type. Because of that, MML is only used between Java services
Having three different ways to communicate makes it a bit difficult to pick the right one. Each protocol has its own advantages and disadvantages when it comes to performance, debuggability and interoperability with non-Java services. MML is assumed to be best performing but is limited to communication between two Java services. When other programming languages get into the mix they are left with only one option, which is plain JSON over HTTP.
Having three solutions to the same problem is far from ideal. Any effort on improving inter-service communication must be done three times for each protocol and each has different quirks which makes it challenging to implement same features for each of them. Ideally, we would like to have only one protocol – one that is both performant, easy to debug, and can be used from different programming languages. To have our cake and eat it, too.
We needed some way to score benchmarks against each other to see which one is “the best” and in which situation. We picked performance as our first priority. We can always build tooling for debugging protocols and write our own implementation if there isn’t one for a specific language. But if a protocol is fundamentally slow performing, then there is no helping it.
We also needed a benchmark. And it is easy to write one that would make a bunch of calls in a loop and measure throughput. And it is even easier to make mistakes with such benchmarks, so it’s not measuring that we’re interested in, or its results, which can be skewed, making them useless.
WRITING A (GOOD) BENCHMARK
Two of our three protocols are limited to the Java programming language, so it makes sense to write the whole benchmark in Java. But writing benchmarks that run on Java Virtual Machine (JVM) without messing up is hard. There are so many details one must keep an eye on:
- It takes some time for the JVM to profile and optimize generated code. There is no point measuring before JVM finished with optimizations.
- JVM will profile and optimize code for the actual usage pattern. If we benchmark two protocols one after another, second one might get worse results because JVM has already specialized code for the first one. To prevent this, each benchmark should be run in a new clean process
- In case of multithreaded benchmarks, special care must be taken to ensure all threads start and stop measuring at the same time
- And that’s not even half of the potential problems. Luckily, there are tools that can solve them for us. One of them is Java Microbenchmark Harness (JMH). Written by the JVM developers, we can safely assume that they know what they are doing
Writing benchmarks with JMH is easy. We only need to write code that we want to measure and JMH will take care of everything else – running code in loop to measure throughput, spinning separate processes for each benchmark, giving JVM enough time to “warm up” and properly optimize code and all other tedious (but important) details.
The skeleton of our benchmark looks something like this:
JHM is quite flexible and can be adapted to various scenarios. Options that were most useful for our benchmark were parameters and the number of threads. Parameters allow us to reuse the same benchmark for different scenarios. There is no difference in benchmark logic when measuring different protocols or using payloads of different size, so with parameters we can avoid copy-pasting same code around. And by varying number of threads that are simultaneously running we can measure how protocols cope with an increasing number of concurrent clients.
Time to run out little benchmark! We actually did two separate runs in different conditions.
During the first run we had both server and clients running on the same machine, communicating over localhost interface. We used this scenario to measure pure protocol overhead when there are no external factors to interfere with the measurements.
In the second run we put our protocols in more realistic conditions. We had server running on a separate machine from the clients’, communicating over a real network. And in the case of HTTP protocols, all traffic was routed through HAProxy since that’s how we use it in production environment. Since MML supports client-side load balancing, a client can connect directly to server, without going through any intermediary.
We had the client side of the benchmark spin up a certain number of clients and each client would repeatedly make a remote call with the payload of certain size and then wait for the server to echo the same payload back. The payload was a simple list of objects with size being the number of objects in list.
Number of clients and payload size was parameterized to cover communication patterns that are typical for our services. We measured with 1, 32, 64 and 128 concurrent clients and with payload size varying between 0 (empty list), 10 and 100 elements.
After complete run JHM will output tabular data with measurements for each combination of given parameters. We visualized those results as simple bar charts for easier comparison.
First, when measuring communication over localhost we get something like this:
For small and medium sized payloads the MML protocol is the obvious winner regardless of the number of concurrent clients. But that changed when we switched to a large payload size, when HTTP based protocols became almost equal to MML in terms of throughput. Another interesting observation is that using Java serialized objects is slower than using JSON. It looks like Jackson, which is JSON library we’re using, is better optimized for our use case.
For second run done over a real network, results look like this:
MML still leads for small and medium sized payloads, but with a smaller margin. And for the large payloads, the tables had turned. Transferring plain JSON over HTTP actually outperforms MML by a significant margin. Typed JSON is probably the root cause for this outcome – all those type tags that it adds noticeably increase the number of bytes that must be sent over the wire, causing throughput to suffer.
So, what have we learned from this experiment? Our in-house protocol, while great on small to medium payloads, is no longer such a good choice when services are sending large requests and responses. On the other hand, using JSON over HTTP has some decent performance and can be used by services written in any language which is definitely a plus. Lastly, using Java serialization over HTTP was inferior in all tested scenarios, so this is one protocol that we will definitely retire.
In the end, we still don’t have a clear winner. We’d like to investigate optimizing JSON over HTTP – or maybe prototype a HTTP/2 based solution and see how it fares. There is still much work to do. Stay tuned!
By Hrvoje Ban, Software Engineer