A while back, one of my colleagues was hanging out in an online developer forum and some people there were starting up a good old-fashioned language war (the type of exchange where one person familiar with language A will announce its superiority over language B, with which the other person isn’t familiar – not really a productive use of time when you think about it, but a popular pastime nonetheless).
During this debate, one developer confidently proclaims that ‘surely nobody would ever write a web service in C++,’ to which my colleague responds, ‘well that’s exactly what we did here at Keepit.’ This prompted some questions, and this piece is an attempt to explain why we did what we did and to explain how this choice has been working out for us, given that this code base started life about 10 years ago.
To put things into perspective, it will be necessary to start with some minimal background information about this service we set out to build.
What is Keepit?
Keepit is a backup service in the cloud. We will store backup copies of your (cloud) data so that if—or when—your primary data is compromised for one reason or another (but most likely because of ransomware or account takeover via phishing), then you will still have complete daily copies of your data going back as many years as you want.
Years of data. This should make you think.
Several years ago, Microsoft claimed having 350 million seats on their 365 platform, which is one of the cloud platforms that we protect. Let’s say we get just 10% of that market (we should get much more because we are by far the best solution out there, but let’s be conservative for the sake of argument), that means we need to store all data for 35 million people (and that’s just on one of these platforms – we protect several other platforms as well).
It doesn’t end there: being backup, we copy all your changes, and we hold your old data, and that means when you clean up your primary storage and delete old documents, we keep the copy. Many customers want a year or three of retention, but we have customers who pay for 100 years of retention.
One hundred years. That means our great grandchildren will be fixing the bugs we put in our code today. This should make you think too.
Knowing the very high-level goals of our service, let’s talk about requirements for such a thing.
Core system: storage
We knew from the get-go that we would be implementing a storage solution which would need to store a very large amount of data (as everything is moving to the cloud, let’s say a few percent of all the world’s data) for a very long period of time (say a hundred years).
Now everyone in the storage business will talk about SSDs, NVMe, and other high-performance data storage technologies. None of this is relevant for large scale, affordable storage, however. Spinning disks is the name of the game and probably will be for at least another decade.
SSDs are getting the density, sure, but they are still not close to viable from a cost perspective. This means we will be writing all data to rotating magnetic media. When you write data to magnetic media, over the years, your media will demagnetize. That means, if we store a backup on a hard drive today, we probably can’t read it back just ten years from now.
That means we need to regularly move all this data from system to system to keep the data ‘fresh.’ Talking about performance, large capacity hard drives today rotate at 7200rpm, exactly the same speed as back in 1992. Access time is dominated by the rotational latency, which means that this is really an aspect of computers that has been almost at a standstill for 30 years while everything else has become faster in every way. We knew we had to deal with this.
I should probably note here that yes, we are talking about running our software on actual physical computers – no public cloud for us. If you want to go big, don’t do what the big players say you should do, do what the big players do. If public cloud was so great, Microsoft wouldn’t have built their own to run 365 – they would have run it on AWS which was very well established long before Microsoft thought about building 365. This doesn’t mean you can’t prototype on public cloud of course.
To solve our core storage need, we designed a filesystem—basically, an object storage system optimized for storing very large-scale backup data. Clearly, we expect the implementation of this storage system to have a significant lifespan.
We may want to create a better implementation one day in the future when hardware has evolved far beyond what we can imagine today, but it is worth pointing out that the storage systems we use today are very similar in architecture to what they would look like 30 years ago, and I would assume in 30 years from today. Clearly, the core code that manages all of your data is not something you want to re-write every few weeks.
So, to implement this system, we went out looking for which new experimental languages had been invented in the six months leading up to implementation start. No wait, we didn’t.
What we need from a language
There are really two types of languages:
1: Systems programming languages – those that have practically no runtime, where you can look at the code and have a high degree of confidence in understanding exactly what that leads to on your processor, the type of language you would write an operating system kernel in. This would be languages like C, C++, and who knows – maybe Rust or something else.
2: The higher-level languages, which often have significant runtimes. The good ones of these offer benefits that you cannot get in a language without a significant runtime. This would be a language like Common Lisp, but people more commonly talk about C# and Java even though I will argue they only do so because nobody taught them Lisp.
So, to be a little more concrete, what it is that we really need from a language is:
2: When we write code today, the tool chain in 20 years’ time must still support our code with little or no changes. Because we simply can’t rewrite our code every few years. We will get nowhere if that’s what we do. That’s why large, important software systems today are still often written in C – because they started out life in the 80s or 90s and they are still the most significant operating system kernels or database management systems that exist to this day.
Steady evolution is the recipe, not rewrite from scratch every three years. This basically rules out any language that hasn’t been standardized and widely used for at least 10 years before we start the project. Meaning, since we started in 2012, that rules out any language that came out after 2002, so Go, Rust, and many other languages would have been out of the picture. C and C++ would work though.
3: We run on Linux. If you do anything significant with computers on a network, you probably run on Linux, too. We don’t want a language that is ‘ported’ to Linux as a curiosity – like C#. We need a language that is native on Linux with a significant and mature toolchain that is certain to receive significant investment for decades to come. Again, that’s really C and C++.
4: You need to design for failure. Everything from writing to a disk, to allocating the smallest piece of memory, can and will eventually fail. Relying on the developer to check error codes or return values at every single call to a nontrivial function (and too many trivial functions too) is rough. Yes, it can be done and there are impressive examples of this.
I am humbled by software such as the Postgres database or the Linux kernel which are very reliable pieces of software written in C which require such tedious checking. C++, in my experience, with RAII and exceptions, offers a much safer alternative. It is not free, of course – it avoids one set of problems and introduces another. In my experience however, it is less difficult to write reliable software using RAII and exceptions than to rely on developers not missing a single potential error return and correct recovery and cleanup. For this reason, I will prefer C++ over C and over both Rust and Go.
5: Obviously the language must offer sufficiently powerful functionality to make the implementation of a larger application bearable and maybe even enjoyable. In reality, however, if your language has functions, you can accomplish a lot; Fortran got functions in 1958 and since then most languages have had them.
Yes, generic programming is nice in C++. A real programmable language like Common Lisp would be preferable of course. Any other modern programming language will surely have some other feature which was added because it is potentially nice and potentially justifies the existence of the language.
But in reality, the hard part is getting your data structures right. Getting your algorithms right. Knowing what you’re trying to build and then building exactly that, nothing more and nothing less.
If we are honest, most languages would work. However, C++ is a nice compromise: it has some generic programming, the STL is incredibly useful, it offers basic OO concepts, and RAII (and structured error handling).
If we look at the criteria here, there really aren’t that many candidate languages to choose from, even if we compromise a bit here and there. Therefore, the question really isn’t ‘why’ we would write a web service in C++, the question really is ‘why wouldn’t we’ write a web service in C++. Realistically, what else would you use, given the scope of what we’re solving here?
Performance matters. Don’t let anyone tell you otherwise. Anyone who says that ‘memory is cheap’ and uses that as an excuse should not be building your large-scale storage systems (or application servers or anything else that does interesting work on large amounts of data).
Donald Knuth said, ‘Premature optimization is the root of all evil’ and I absolutely believe that. However, ‘no optimization and elastic scaling is the root of all public cloud revenue’ is probably also true. Don’t go to extremes – don’t put yourself in a situation where you cannot, at the appropriate time, optimize your solution to be frugal with its resource use. When your solution is ‘elastically scaling’ for you in some public cloud on a credit card subscription, it is very hard to go back and fix your unit economics. Chances are you never will.
The typical computer configuration for a storage server in Keepit is 168 18 TB hard drives attached to a single-socket 32-core 3.4GHz 64-bit processor and 1TiB of RAM. It’s really important to note here that we use only one TiB of RAM for three PiB of raw disk: this is a 3000:1 ratio – it is not uncommon to see general purpose storage systems recommend a 30:1 ratio of disk to RAM (which would require us to run with 100TiB of RAM at which point memory most certainly isn’t cheap anymore). Through the magic of our storage software, this gives us about 2PiB of customer-usable storage in only 11U of rack space. This means we can provide a total of 8PiB of usable storage in a single 44U rack of systems, consuming less than 10kW of power. This matters.
If you run a business, you want to be able to make a profit. Your customers will want you to make a profit, especially if they bet on you having their data 100 years from now. If you want to grow your business with investments, your investors will think this matters. In Keepit, we have amazing unit economics – we got the largest series A round of investment for an IT company in the history of Denmark ever – and part of the reason for that was because of our unit economics. Basically, our storage technology, not least the implementation of it, enabled this.
The choice of C++ has allowed us to implement a CPU- and memory-efficient storage system reliably that uses the available hardware resources to their fullest extent. This ranges from careful layout of data structures in memory to an efficient HTTP stack that exposes the functionality and moves more than a GiB of data per second per server over a friendly RESTful HTTP API on the network. C++ enables and supports every layer of this software, and that is quite a feat.
Let me briefly digress with another note on versatility. I have this personal hobby project where I am developing a lab power supply for my basement lab (because every basement needs a lab). In order to adjust current and voltage limits, I want to use rotary encoders rather than potentiometers.
A rotary encoder is basically an axle that activates two small switches in a specific sequence and by looking at the sequence you can detect if the user is turning the axle in one direction or the other. The encoder signal gets fed to a 1MHz 8-bit processor with 1 kB of RAM and 8 kB of flash for my code.
To implement the code that detects the turning of these encoders, it makes sense to use a textbook, object-oriented approach. Create a class for an encoder. Define a couple of methods for reading the switches and for reading out the final turn data. Declare a bit of local state. Beautifully encapsulated in pure OO style. The main logic can then instantiate the two encoders and call the methods on these objects. I am implementing the software for this project in C++ as well – try to think about that for a moment: The same language that allows us to efficiently and fully utilize a 32-core 3.4GHz 64-bit processor with 1TiB of RAM and 3PiB of raw disk works ‘just as well’ on a 1-core 1MHz 8-bit processor with 1kiB of RAM and 8kiB of flash storage – and the code looks basically the same.
There are not many languages that can stretch this wide and not show the slightest sign of being close to its limit. This is truly something to behold.
The rest of the stack
The storage service exposes a simple RESTful API over HTTP using an HTTP stack we implemented from scratch in C++. Instantiating a web server in C++ is a single line of code – processing requests is as trivial as one could wish for.
I’ve heard plenty of arguments that doing HTTP or XML or other ‘web’ technology work would be simpler in Java or C# or other newer languages, but really, if you write your code well, why would this be difficult? Why would you spend more than a line of code to instantiate a web server? Why would parsing an XML document be difficult?
For XML, we implemented a validating parser using C++ metaprogramming; I have to be honest and say this was not fun all the way through and I couldn’t sit down and write another today without reading up on this significantly first. C++ metaprogramming is nothing like a proper macro system – but it can absolutely solve a lot of problems, including giving us an RNC-like schema syntax for declaring a validating XML parser and generating efficient code for exactly that parser.
This also means when we parse an XML document and we declare that one of the elements is an integer, then either it parses an integer successfully or it throws. If we declare a string, we get the string properly decoded so that we always work on the native data – we cannot ever forget to validate a value and we cannot ever forget to escape or un-escape data. By creating a proper XML parser using the language well, we have not only made our life simpler, we have also made it safer.
The entire software ecosystem at Keepit may revolve around our storage systems, but we have several other supporting systems that use our shared components for the HTTP and XML stack.
One other notable C++ system is our search engine. Like so many other companies, we found ourselves needing a search engine to assist us with providing an amazing end user experience when browsing their datasets. And like so many others we fired up a cluster of Elasticsearch servers and went to work.
Very quickly we got hit by this basic fact that Elastic is great at queries and not very good at updates – and we have many more updates than we have queries. We simply couldn’t get this to scale like we’re used to. What to do?
While struggling with Elastic, we started the ‘Plan-B’ project to create a simple search engine from scratch – this engine has been our only search engine for years now and to this day, the process is still called ‘bsearch.’
Our search engine offers a google-like matching so that you can find your documents even if you misspell them, and it is a piece of technology that we are quite actively developing both to improve matching capabilities across languages and to allow for more efficient processing of much larger datasets, which will open up for other uses in the future.
Of our backend code base, about 81% of our code is C++. Another 16% is Common Lisp. The remaining 3% is Java.
We use Common Lisp in two major areas: For ‘general purpose’ business functions such as messaging, resource accounting, billing, statistical data processing, etc. And we use it for backup dataset processing. These are two very different uses.
The first is a more classical application of the language where performance is maybe less of a concern but where the unparalleled power of the language allows for beautiful implementations of otherwise tedious programs.
The second use is a less traditional use case where enormous datasets are processed and where the majority of the memory is actually allocated and managed outside of the garbage collector – it is truly a high-performance Lisp system where we benefit from the power of the language to do interesting and efficient extractions of certain key data from the hundreds of petabytes of customer data that pass through our systems.
Many people don’t know Common Lisp and may propose that ‘Surely nobody would write a web service in Common Lisp.’ Well, as with all other languages you need to understand the language to offer useful criticism; and the really groundbreaking feature of Common Lisp is its macro system. It is what makes Common Lisp by far the most powerful language in existence by a large margin.
This is nothing like C pre-processor macros; the Common Lisp macro system allows you to use the full power of the language to generate code for the compiler. Effectively, this means the language is fully programmable. This is not something that is simple to understand since there is no meaningful way to do this using C-like language syntax, which is also why the Lisp dialects have a syntax that is fundamentally different from other languages.
In other words, if you do not understand the Lisp syntax, you are not equipped to comprehend what the macro system allows. This is not simple to wrap your head around, but, for example, I can mention that Common Lisp was the first general purpose programming language to get Object Orientation added to it, and this was done not with a change to the language and the compiler, but with a library that contained some macros. Imagine that.
Fortran allows you to implicitly declare the type of variables by using certain letters in the first character of the variable name – just for fun, I implemented that with a macro for Common Lisp. If I wanted to do that with C or C++ or any other language, I would need to extend the compiler.
The idea of using the first character in the name of the variable to implicitly declare its type is of course ridiculous, but there are many little syntactical shortcuts or constructs that can help you in daily life that you may wish was present in your language of choice which you can only hope the language steering committee may one day add to the standard.
With Common Lisp, this is everyday stuff – no need to wait. If you want a new type of control structure or declaration mechanism, just go ahead and build it. The power of this cannot be overstated. C++ metaprogramming (and go generics and everything else) pales in comparison, useful as it is.
First of all, it really sucks to have multiple languages; you can’t expect everyone to be an expert in all, so by having more than one language, you decimate the effective size of your team. However, we picked Common Lisp to replace a sprawling forest of little scripts done in more languages than I could shake a stick at—meaning we are fortunate to have only two languages on our backend.
C++ and Common Lisp are so different that they complement each other well. Yes, we could have done everything in C++, but there are problems we solve in Common Lisp which would have been much less enjoyable to solve in C++. Now on the downside, we have two HTTP stacks, two XML stacks, two database libraries, two connection pools, and so on and so forth. There is no simple perfect solution here; the compromise we have arrived at is indeed working out very well for us.
The best you can do is go to a conference, talk about your tech, and hope that some developers show up at your booth to talk about employment.
Old grumpy man’s advice for youngsters considering a career in software engineering
First of all, seriously consider a computer science education. There exist amazingly qualified people who do not have this and some of them work for us, but in my experience most really good developers have this. It certainly helps to get a foundation of mathematics, logic, and basic computer science. Knowing why things work will make learning new things infinitely simpler.
Learn multiple, properly different programming languages and write actual code in them. You need to experience (by failing) how functions are useful as abstractions and how terrible it is to work with ill-designed abstractions. You need to fail and spend serious time failing.
Make sure one of those languages is a compiled language with little or no runtime: C, C++, Rust, or even Fortran for that matter (not sure Fortran has much long-term perspective left in it though – it’s probably time to say goodbye). Now challenge yourself to write the most efficient implementation of some simple problem – maybe a matrix multiplication for example.
Disassemble the code and look at it. At least get some understanding of the processor instructions and why they are generated from the code you wrote. Learn how cache lines matter. Time your code and find out why your solution isn’t faster than it is. Then make it faster until you can prove to yourself that your instructions pipeline as much as they can, your cache misses are minimal, you don’t wait on register write delays and so on and so forth.
Also, make sure that one of those languages is Common Lisp. It should be a criminal offence for a university to not teach Common Lisp in their computer science curriculum. Read ‘The Structure and Interpretation of Computer Programs – SICP’ too. Even if you will never use Lisp again, knowing it will make you a better developer in any other language.