Skip to content

“When it comes to ransomware attacks, it’s a matter of when, not if.”

Ransomware attacks are on the rise — in the first half of 2021, the average amount paid by organizations to perpetrators of was $570,000, an increase of 171% over the previous year. (1)

Last year also saw a 93% increase in the overall number of ransomware attacks (2) – a trend that is only likely to continue. While such attacks were once limited to outlandish movie plots, they’ve become an all-too-real problem for organizations of all sizes. In fact, when it comes to ransomware attacks, it’s more likely to be a question of when, not if.

Our concern at Keepit is that the regularity of ransomware attacks may lead to them eventually being dismissed as just a cost of doing business. But by choosing to pay the ransoms demanded, companies are powering a vicious cycle where the proceeds fuel increased cybercrime. (And paying a ransom does not guarantee getting your data back, as documented in the report ‘The Long Road Ahead to Ransomware Preparedness’ from ESG)

It’s vital for the sake of commerce – and for society – that companies, governments, and law enforcement agencies come together to find long-term solutions to ransomware attacks.

In the short-term, we encourage companies to invest in a third-party backup and recovery service to minimize the threat posed by encrypted malware. The more secure your data is—and the quicker you’re able to recover it—the less worried you need to be about ransomware attacks.

At best, an attack won’t affect business continuity – it’ll just be a nuisance rather than a crisis. If you know your data is safe, you don’t have to pay the bad guys’ ransom. Problem solved.

Summing Up 

The disruptive power of ransomware attacks in 2022

An increasingly common threat, ransomware attacks are forecast to cost victims around $265 billion annually by 2031. (3) With conventional data recovery times often taking weeks or even months, the disruption to companies can be catastrophic in terms of financial costs to your business. But the damage goes beyond the bottom line. Additional impacts of ransomware attacks in 2022 are likely to include:

  • Intellectual property cost – temporary or permanent loss of sensitive or proprietary information can be enormously damaging. 
  • Business continuity – disruption is frustrating and costly as companies struggle to restore data and operations 
  • Reputational cost – a ransomware attack can damage customer perception of the company and impact digital trust. 

Why Keepit is the answer

Keepit backs up to an independent cloud, separate from your SaaS vendor’s environment, which means your data can be accessed completely independent from SaaS application availability. True backup—immutable and tamperproof on a separate logical infrastructure — is your answer to ransomware attacks. 

 

For more details about Keepit’s dedicated SaaS data protection, read about our security on our website 

References

  1. Research from Palo Alto suggests the average ransom in the first half of 2021 is $570,000 USD, an increase of 171% over the year prior; see Average Ransomware Payment Hits $570,000 in H1 2021 [Dark Reading] 
  2. Research from Check Point reports that ransomware incidents increased 93% year over year; see Ransomware attacks increase dramatically during 2021 [Computer Weekly] 
  3. https://cybersecurityventures.com/global-ransomware-damage-costs-predicted-to-reach-250-billion-usd-by-2031/

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About Keepit
At Keepit, we believe in a digital future where all software is delivered as a service. Keepit’s mission is to protect data in the cloud Keepit is a software company specializing in Cloud-to-Cloud data backup and recovery. Deriving from +20 year experience in building best-in-class data protection and hosting services, Keepit is pioneering the way to secure and protect cloud data at scale.

Why You Need Backup for Google Workspace

The top 3 misconceptions made by Google Workspace admins

If you’re wondering, “is my data truly protected by relying only Google Workspace’s default backup and recovery solution,” then you’re in the right place. Cloud applications, like Google Workspace, are an integral part of our daily life – we push data to the cloud constantly. I do it when I send an email on Gmail, share a document with coworkers via Drive, or add my mother-in-law’s birthday to my Google Calendar (better not forget it again!).

But is relying on Google’s default data protection enough? What are the main misconceptions when it comes to Google Workspace data backup and recovery?

Misconception #1: Relying on Google Workspace’s default data protection is enough

If you think Google apps is a secure platform, you’re right: Google platform is a secure, resilient, and reliable solution, and protecting data is their top priority.

As much as Google will likely never lose the data you are storing on their platform, they do not cover you if the data loss happens from your side. Google’s default data protection does not protect you against human error, malicious actions, ransomware and hackers, and synchronization errors. You are responsible for ensuring the necessary protection of your data.

Based on an Enterprise Strategy Group (ESG) survey, only 13% of the businesses surveyed understood that protecting their SaaS data is their responsibility, not the responsibility of the SaaS vendor.

According to ESG SaaS data protection research, 45% of organizations using SaaS attribute data losses they’ve experienced to deletion, whether accidental or malicious. When this happens with Google Workspace, Google is not able to identify if the deletion was intentional or not. The data will be deleted and totally unrecoverable once past Google Workspace trash bin’s retention time, a mere 30-days later.

You need a solid backup and recovery solution for your Google Workspace.

Misconception #2: I don’t need a third-party backup and recovery solution, I have Google Vault

As a subscribed user to certain editions of Google Workspace, you have access to Google’s retention and eDiscovery tool: Google Vault. With Vault, you can retain, hold, search, and export some users’ Google Workspace data.

Yet, Google Vault is not a backup tool. To this frequently asked question, “Is Vault a data backup or archive tool?” Google itself answers, “No. Vault isn’t designed to be a backup or archive tool.”

Based on Google’s own support website, here are reasons why you shouldn’t use Google Vault for backups:

  • Vault exports aren’t designed for large-scale or high-volume data backups. You can export data for a limited number of accounts and only for one Google service at a time. Vault also doesn’t allow many parallel exports or scheduling automatic exports.
  • Vault exports are prepared for legal discovery purposes, not efficient data processing. Vault can’t create differential backups or deduplicate data. For example, a Drive export includes all items the searched account has access to. When many accounts have access to the same items, they’re exported for each account, resulting in lots of duplicated data.
  • Vault doesn’t support all Google services. Vault can export data only from supported Google services. Vault doesn’t support services such as Calendar for instance.
  • Restoring data from Vault export files is hard. Vault doesn’t have any automated recovery tools.

Google Vault is not designed to recover lost or corrupted data and it cannot perform a which is a critical feature of any third-party backup and recovery tool.

Additionally, Google Vault does not keep ex-users’ data. For example, if an employee departs from your company and, as the admin, you delete his user Google Workspace account, all his data saved within their Vault will be also deleted. To save those data, it would require you to transfer all the employee’s data out of the Vault before deleting the account.

Misconception #3: A third-party tool can only help with backup data

By now, you know that backing up your Google Workspace data is your responsibility, not Google’s. It’s a common misconception that third-party backup solutions are a cost center purely performing secure backup and allowing for data recovery. These are the fundamentals, but there’s much more to it:

Benefit #1 – Cost savings

Budget constraints are making it harder than ever to implement new IT initiatives for IT Managers – They need to do more with less and maximize available resources.

Of course, deploying a backup and recovery solution for your SaaS applications comes with a cost, yet there are important (and substantial) cost-savings opportunities.

The first is through reduced SaaS licensing fees. Based on a recent Total Economic Impact report done by Forrester, companies save on months of SaaS licensing fees for employees who leave the organization – or around 10% of the work force per year. This number can be much higher if organizations use a lot of temporary staff or contractors. Having all historical data available simplifies data management and employee onboarding and offboarding.

The second is reduced auditing and legal costs. In the same TEI report, one of the organizations surveyed shared that seven days of auditor and lawyer costs are avoided each year by having SaaS data availability.

Benefit #2 – Regulatory compliance

Handling sensitive data is subject to stringent record retention and data reproduction requirements for all public records. With a proper backup and recovery solution, you can expect to:

  • Gain access to fast information discovery
  • Easy retention policy management
  • Additional rights to ensure compliance with applicable outsourcing regulatory requirements (e.g., extended audit rights, chain-sourcing approval rights).

In addition, data center facilities leveraged to store the data have high physical security standards and certifications (ISO 27001, SOC-2, ISAE 3402, PCI/DSS, HIPAA). It is important that you ask your vendor what they offer regarding regulatory compliance and data center certifications when investigating which tool to deploy.

Benefit #3 – Real disaster recovery

Third-party backup and recovery solutions must (not should) allow you to perform disaster recovery. The shortlist of important points to look for when selecting your solution:

  • Data availability: Get access to all your data, at any time, from anywhere. A proper backup solution provides you with unlimited storage, is cloud-based so you can always access your data, and it should reside on its own cloud for enhanced security and control.
  • Hot storage of data: Get your data on demand
  • Quick restore options for data: Restore fast, regardless of if it’s a single email or an entire point-in-time backup for your organization
  • On-the-go backup status monitor: Get updated with a mobile admin app

Keepit Backup and Recovery for Google Workspace

Keepit for Google Workspace is the world’s only independent cloud dedicated to backup and recovery. It is easy to use and keeps your Google Workspace data highly secure, always available, and instantly restorable.

Keep your data available 24/7 with automatic backup and unlimited storage
Quickly find and restore data, whether you want to restore one single email or an entire snapshot for your organization.

Easy to set up, easy to use, easy to scale
Keepit is a set-and-forget installation that is easy to use: No training needed. You can integrate it with your existing system thanks to our API-first approach. No hidden fees, no surprises, and 24/7 support.

Choose the World’s only independent cloud for immutable data
Security is in our DNA. Once your data is backed up with Keepit, it is made immutable and undeletable thanks to blockchain-verified technology. It is a priority for us to provide you with excellent reliability, great backup and restore performance, instant access to individual files, multi-factor authentication, and data encryption at rest and in transit.

Learn more on our Google Workspace backup and recovery

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About Keepit
At Keepit, we believe in a digital future where all software is delivered as a service. Keepit’s mission is to protect data in the cloud Keepit is a software company specializing in Cloud-to-Cloud data backup and recovery. Deriving from +20 year experience in building best-in-class data protection and hosting services, Keepit is pioneering the way to secure and protect cloud data at scale.

Fast and Simple eDiscovery with Backup and Recovery

What is eDiscovery?

Electronic discovery (sometimes known as eDiscovery, e-discovery) is one of those terms that means slightly different things in different contexts. 

For example, in legal spheres, eDiscovery involves identifying, preserving, collecting, processing, reviewing, and analyzing electronically stored information (ESI). The term also shows up in digital forensics, which focuses on identifying, preserving, collecting, analyzing, and reporting on digital information—clearly very similar, but not quite equivalent. 

In general, eDiscovery is the electronic aspect of identifying, collecting, and producing electronically stored information, such as emails, documents, databases, audio, and video files, and also includes metadata such as time-date stamps, file properties, and author and recipient information. In other words—regardless of the specific driving need—eDiscovery refers to finding and retrieving electronically stored ‘stuff’. 

Sounds easy enough, right? But as anyone who’s performed eDiscovery knows, today’s information-enabled organizations produce an awful lot of that stuff. In fact, the tendency for every single action we take to produce a digital trail led public-interest technologist Bruce Schneier to observe that “data is the exhaust of the information age” [Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World, pg. 4].

Consequently, the sheer volume of electronically stored information makes eDiscovery a logistical challenge. Now, add in the time-specific nature of many requests—as in, needing to retrieve a file or record as it existed at a certain time on a certain date, a certain number of years ago—and the challenge becomes even greater. 

Beyond backup: enabling quick and simple eDiscovery

While the retention utilities included with software-as-a-service (SaaS) applications and cloud services may be adequate for retrieving something that’s a few weeks old, they certainly aren’t intended to provide—nor are they capable of providing—a substitute for long-term backup and the use cases it enables, including disaster recovery and eDiscovery.

To be resilient in the face of outages, compromises, and misconfigurations (or simply to find a crucial piece of information), your organization needs to be able to search and access SaaS and cloud data quickly and easily. Imagine the difference between a recovery mission aided by coordinates and a map versus a vague notion that someone is somewhere. 

Fortunately, with the right backup solution in place, eDiscovery really can be a breeze. Let’s look at a real-world example. 

ALPLA’s experience

With around 22,000 employees across 45 countries, ALPLA is one of the world’s leading manufacturers of high-quality plastic packaging.

The company’s rapid global expansion and cloud migration required an agile Microsoft 365 backup and recovery solution that could meet ALPLA’s need for 10-year data retention, and Keepit is proud to fulfill this need.

With other solutions, finding the right data to restore can be a tedious task, especially when very little information is provided by users—but Keepit’s unique and intelligent search features make it easy. In the words of Stefan Toefferl, Senior Data Center Engineer at ALPLA: “Keepit provides search filters that make eDiscovery simple, allowing us to quickly find and restore an exact file.”

One of the features most valued by ALPLA is the option to share a secure link to download a file, quickly getting the data back to the users. It’s features like this Public Links (40-second demo video) that makes Keepit more than just an ordinary backup and that helps our customers to become more efficient in their daily IT operations. Read more about the ALPLA customer case here.

Risk management in the digital age

The nature of backup and restoration is that you often don’t know when something might be needed: unexpected audits, legal discovery, cybersecurity incidents, or even an employee needing to recover something that they deleted years ago—these can all happen at any time.

That’s why truly managing risk requires a third-party backup solution that: 

  • Protects users and groups by providing snapshot-based restoration and timeline-based comparative analysis 
  • Preserves roles and permissions, with change tracking and straightforward comparisons 
  • Enables compliance and eDiscovery, for instance by capturing audit and sign-in logs, supporting log analysis, ensuring long-term retention, and enabling restoration to another site 
  • Accommodates growth into policies and devices by preserving device information and conditional access policies 

To help enterprises avoid disruption due to lost or inaccessible SaaS data, Keepit has architected a dedicated, vendor-neutral SaaS data backup solution that is resilient, secure, and easy to use.

You can see Keepit in action on our YouTube channel, or head to our services page to learn more about what we offer.  

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About Keepit
At Keepit, we believe in a digital future where all software is delivered as a service. Keepit’s mission is to protect data in the cloud Keepit is a software company specializing in Cloud-to-Cloud data backup and recovery. Deriving from +20 year experience in building best-in-class data protection and hosting services, Keepit is pioneering the way to secure and protect cloud data at scale.

Is Litigation Hold a Reasonable Replacement for Backup in Microsoft 365?

We get asked this question often, and at face value, it’s easy to see how one could equate litigation hold with backup – both have something to do with ‘preserving’ data. However, the reality is that backup and litigation hold differ on many points, and any company that fails to understand the differences between them (and the utility of each) will eventually learn the repercussions the hard way. Let’s explore the key differences between litigation hold and backup.

What Is Litigation Hold?

The term ‘litigation hold’ comes from US case law (2003, Zubulake v. UBS Warburg) where the judge ruled: ‘once a party reasonably anticipates litigation, it must suspend its routine document retention/destruction policy and put in place a ‘litigation hold’ to ensure the preservation of relevant documents.’

In 2010, Microsoft introduced a litigation hold (sometimes referred to as legal hold) retention feature for Microsoft Exchange to support eDiscovery. The feature was intended primarily as a way of preserving data should there be a legal need to preserve it for access and viewing during a litigation. Think of it as being for documentation purposes, not as a way to restore data back in place to operating platforms like Microsoft 365.

Microsoft later added the ability to create what they call in-place holds, which are holds based on a query (such as “find all messages containing the phrase ‘Project Starburst’). The back-end implementation of litigation and in-place holds are slightly different; you can see more details in Microsoft’s documentation.

Let me say it again, slightly differently: Litigation hold wasn’t designed with the intention of serving as a backup service. Yet, some still try to rely on it as a backup solution, particularly to make ends meet when not having a designated data security plan (including a third-party backup solution), with the reasoning that “some sort of data preservation is better than none, right?”

However, there are many drawbacks and substantial risks associated with these types of setups that lead to a risky, false sense of data security. Some of the shortcomings and risks of relying on litigation hold as a backup are:

  • Data storage quotas capped at only 110 GB
  • Some eDiscovery features require additional-cost licenses; if you don’t buy the licenses, you can’t use the features
  • User mailbox data is only kept while an Exchange Online license is assigned to the user. When a user leaves or becomes inactive, removing the license will eventually remove the data.
  • Recovering data needs an administrator and is a time-consuming process
  • The held data is not physically separate from the original copy

The bottom line is that you can’t depend on litigation hold or in-place holds as mechanisms for general-purpose recovery from mistakes or disasters. That’s not what they’re meant for, and you run the risk of losing data if you try to use them for that purpose.

What Is Backup?

Backup, by definition, provides one or more additional copies of your data, stored in a location physically separate from that of your primary dataset. Physical separation is a fundamental facet of backup, since storing your backup data in the same location as the primary data represents a single point of failure.  Effectively, there is no data redundancy in these types of setups.

With traditional on-premises backup, the physical separation rule meant having an off-premises backup stored in another building – so that in the event of a disaster, e.g. a fire in one building, would not destroy all your data. For cloud backup, it’s fair to ask ‘what cloud does my backup data go to?’ The answer is usually either ‘Microsoft Azure’ or ‘Amazon Web Services.’ Ideally, you want that data going to a cloud not operated by your SaaS application vendor (so, it wouldn’t be fair to put your Microsoft 365 data into Azure); otherwise, you’re violating the physical-separation rule.

Any service that is not providing this separation of copies is not—and should not be—considered a true backup.

At Keepit, we talk a lot about the ‘3 Ms’ that can cause data loss: mistakes made by people; mishaps at the SaaS application vendor; and malicious actions from inside or outside the organization.

Following data protection best practices, a properly executed backup scheme provides against all three of the Ms if anything should happen to the primary (original) dataset: malicious action in the form of a ransomware attack or a disgruntled employee; mistakes where someone with legitimate access accidentally deletes important data (or needs to back out changes they didn’t want to keep); and mishaps, where the service provider has an outage or data loss. Litigation holds can’t protect you against all 3 of the Ms: there’s no physical separation, limited ability to do large-scale restores, and no real concept of version control.

What to Look for In a Cloud SaaS Backup Solution

Besides the must-have features of data redundancy and availability, a worthy backup solution will offer a multitude of convenience and productivity-boosting tools and services, further distancing it from litigation hold. The first thing to look for is a solution that’s purpose-built for the cloud, not a refurbished or reskinned on-premises solution. Rather, a good, dedicated third-party backup solution.

Here are some of the key benefits to look for in a dedicated third-party backup solution:

  • Simple, quick restoration of the data you need, when and where you need it, in the format you need it
  • Direct restore from live storage, with no waiting for offline or near-line storage
  • An intuitive interface for quickly and easily finding and previewing specific files or messages before storing them
  • Secure, immutable storage in an independent cloud
  • Flexible geographic storage options to cover your data sovereignty requirements
  • A predictable and transparent cost model, with no hidden surprise charges for data ingress, egress, or storage

For more insight into data protection in the cloud era, get an in-depth look via the e-guide on Leading SaaS Data Security. Or, if you’d like to learn more about Keepit backup and recovery services for Microsoft 365, Salesforce, Google Workspace, and others, visit this page.

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About Keepit
At Keepit, we believe in a digital future where all software is delivered as a service. Keepit’s mission is to protect data in the cloud Keepit is a software company specializing in Cloud-to-Cloud data backup and recovery. Deriving from +20 year experience in building best-in-class data protection and hosting services, Keepit is pioneering the way to secure and protect cloud data at scale.

Surely nobody would write a web service in C++

A while back, one of my colleagues was hanging out in an online developer forum and some people there were starting up a good old-fashioned language war (the type of exchange where one person familiar with language A will announce its superiority over language B, with which the other person isn’t familiar – not really a productive use of time when you think about it, but a popular pastime nonetheless).

During this debate, one developer confidently proclaims that ‘surely nobody would ever write a web service in C++,’ to which my colleague responds, ‘well that’s exactly what we did here at Keepit.’ This prompted some questions, and this piece is an attempt to explain why we did what we did and to explain how this choice has been working out for us, given that this code base started life about 10 years ago.

To put things into perspective, it will be necessary to start with some minimal background information about this service we set out to build.

What is Keepit?

Keepit is a backup service in the cloud. We will store backup copies of your (cloud) data so that if—or when—your primary data is compromised for one reason or another (but most likely because of ransomware or account takeover via phishing), then you will still have complete daily copies of your data going back as many years as you want.

Years of data. This should make you think.

Several years ago, Microsoft claimed having 350 million seats on their 365 platform, which is one of the cloud platforms that we protect. Let’s say we get just 10% of that market (we should get much more because we are by far the best solution out there, but let’s be conservative for the sake of argument), that means we need to store all data for 35 million people (and that’s just on one of these platforms – we protect several other platforms as well).

It doesn’t end there: being backup, we copy all your changes, and we hold your old data, and that means when you clean up your primary storage and delete old documents, we keep the copy. Many customers want a year or three of retention, but we have customers who pay for 100 years of retention.

One hundred years. That means our great grandchildren will be fixing the bugs we put in our code today. This should make you think too.

Knowing the very high-level goals of our service, let’s talk about requirements for such a thing.

Core system: storage

We knew from the get-go that we would be implementing a storage solution which would need to store a very large amount of data (as everything is moving to the cloud, let’s say a few percent of all the world’s data) for a very long period of time (say a hundred years).

Now everyone in the storage business will talk about SSDs, NVMe, and other high-performance data storage technologies. None of this is relevant for large scale, affordable storage, however. Spinning disks is the name of the game and probably will be for at least another decade.

SSDs are getting the density, sure, but they are still not close to viable from a cost perspective. This means we will be writing all data to rotating magnetic media. When you write data to magnetic media, over the years, your media will demagnetize. That means, if we store a backup on a hard drive today, we probably can’t read it back just ten years from now.

That means we need to regularly move all this data from system to system to keep the data ‘fresh.’ Talking about performance, large capacity hard drives today rotate at 7200rpm, exactly the same speed as back in 1992. Access time is dominated by the rotational latency, which means that this is really an aspect of computers that has been almost at a standstill for 30 years while everything else has become faster in every way. We knew we had to deal with this.

I should probably note here that yes, we are talking about running our software on actual physical computers – no public cloud for us. If you want to go big, don’t do what the big players say you should do, do what the big players do. If public cloud was so great, Microsoft wouldn’t have built their own to run 365 – they would have run it on AWS which was very well established long before Microsoft thought about building 365. This doesn’t mean you can’t prototype on public cloud of course.

To solve our core storage need, we designed a filesystem—basically, an object storage system optimized for storing very large-scale backup data. Clearly, we expect the implementation of this storage system to have a significant lifespan.

We may want to create a better implementation one day in the future when hardware has evolved far beyond what we can imagine today, but it is worth pointing out that the storage systems we use today are very similar in architecture to what they would look like 30 years ago, and I would assume in 30 years from today. Clearly, the core code that manages all of your data is not something you want to re-write every few weeks.

So, to implement this system, we went out looking for which new experimental languages had been invented in the six months leading up to implementation start. No wait, we didn’t.

What we need from a language

There are really two types of languages:

1: Systems programming languages – those that have practically no runtime, where you can look at the code and have a high degree of confidence in understanding exactly what that leads to on your processor, the type of language you would write an operating system kernel in. This would be languages like C, C++, and who knows – maybe Rust or something else.

2: The higher-level languages, which often have significant runtimes. The good ones of these offer benefits that you cannot get in a language without a significant runtime. This would be a language like Common Lisp, but people more commonly talk about C# and Java even though I will argue they only do so because nobody taught them Lisp.

And then you have the other languages that fit various niche use cases. This could be Python, Haskell, JavaScript, and so forth. I don’t mean to talk them down, but they are not reasonable languages for software development of the type we are talking about here; and since what we’re talking about here isn’t actually so special, you could take my argument to mean that they are just not very reasonable languages for software development outside of niche uses, and that would be a fair interpretation of my opinion.

So, to be a little more concrete, what it is that we really need from a language is:

1: It must support the efficient implementation of algorithms and data structures; meaning we must have tight control over memory when we need it, our language must support the actual hardware data types like 64-bit integers on modern processors, etc. So, this rules out Python (not compiled), Ruby (not compiled) and JavaScript (JIT but doesn’t have integers or arrays).

2: When we write code today, the tool chain in 20 years’ time must still support our code with little or no changes. Because we simply can’t rewrite our code every few years. We will get nowhere if that’s what we do. That’s why large, important software systems today are still often written in C – because they started out life in the 80s or 90s and they are still the most significant operating system kernels or database management systems that exist to this day.

Steady evolution is the recipe, not rewrite from scratch every three years. This basically rules out any language that hasn’t been standardized and widely used for at least 10 years before we start the project. Meaning, since we started in 2012, that rules out any language that came out after 2002, so Go, Rust, and many other languages would have been out of the picture. C and C++ would work though.

3: We run on Linux. If you do anything significant with computers on a network, you probably run on Linux, too. We don’t want a language that is ‘ported’ to Linux as a curiosity – like C#. We need a language that is native on Linux with a significant and mature toolchain that is certain to receive significant investment for decades to come. Again, that’s really C and C++.

4: You need to design for failure. Everything from writing to a disk, to allocating the smallest piece of memory, can and will eventually fail. Relying on the developer to check error codes or return values at every single call to a nontrivial function (and too many trivial functions too) is rough. Yes, it can be done and there are impressive examples of this.

I am humbled by software such as the Postgres database or the Linux kernel which are very reliable pieces of software written in C which require such tedious checking. C++, in my experience, with RAII and exceptions, offers a much safer alternative. It is not free, of course – it avoids one set of problems and introduces another. In my experience however, it is less difficult to write reliable software using RAII and exceptions than to rely on developers not missing a single potential error return and correct recovery and cleanup. For this reason, I will prefer C++ over C and over both Rust and Go.

5: Obviously the language must offer sufficiently powerful functionality to make the implementation of a larger application bearable and maybe even enjoyable. In reality, however, if your language has functions, you can accomplish a lot; Fortran got functions in 1958 and since then most languages have had them.

Yes, generic programming is nice in C++. A real programmable language like Common Lisp would be preferable of course. Any other modern programming language will surely have some other feature which was added because it is potentially nice and potentially justifies the existence of the language.

But in reality, the hard part is getting your data structures right. Getting your algorithms right. Knowing what you’re trying to build and then building exactly that, nothing more and nothing less.

If we are honest, most languages would work. However, C++ is a nice compromise: it has some generic programming, the STL is incredibly useful, it offers basic OO concepts, and RAII (and structured error handling).

If we look at the criteria here, there really aren’t that many candidate languages to choose from, even if we compromise a bit here and there. Therefore, the question really isn’t ‘why’ we would write a web service in C++, the question really is ‘why wouldn’t we’ write a web service in C++. Realistically, what else would you use, given the scope of what we’re solving here?

Versatility

Performance matters. Don’t let anyone tell you otherwise. Anyone who says that ‘memory is cheap’ and uses that as an excuse should not be building your large-scale storage systems (or application servers or anything else that does interesting work on large amounts of data).

Donald Knuth said, ‘Premature optimization is the root of all evil’ and I absolutely believe that. However, ‘no optimization and elastic scaling is the root of all public cloud revenue’ is probably also true. Don’t go to extremes – don’t put yourself in a situation where you cannot, at the appropriate time, optimize your solution to be frugal with its resource use. When your solution is ‘elastically scaling’ for you in some public cloud on a credit card subscription, it is very hard to go back and fix your unit economics. Chances are you never will.

The typical computer configuration for a storage server in Keepit is 168 18 TB hard drives attached to a single-socket 32-core 3.4GHz 64-bit processor and 1TiB of RAM. It’s really important to note here that we use only one TiB of RAM for three PiB of raw disk: this is a 3000:1 ratio – it is not uncommon to see general purpose storage systems recommend a 30:1 ratio of disk to RAM (which would require us to run with 100TiB of RAM at which point memory most certainly isn’t cheap anymore). Through the magic of our storage software, this gives us about 2PiB of customer-usable storage in only 11U of rack space. This means we can provide a total of 8PiB of usable storage in a single 44U rack of systems, consuming less than 10kW of power. This matters.

If you run a business, you want to be able to make a profit. Your customers will want you to make a profit, especially if they bet on you having their data 100 years from now. If you want to grow your business with investments, your investors will think this matters. In Keepit, we have amazing unit economics – we got the largest series A round of investment for an IT company in the history of Denmark ever – and part of the reason for that was because of our unit economics. Basically, our storage technology, not least the implementation of it, enabled this.

The choice of C++ has allowed us to implement a CPU- and memory-efficient storage system reliably that uses the available hardware resources to their fullest extent. This ranges from careful layout of data structures in memory to an efficient HTTP stack that exposes the functionality and moves more than a GiB of data per second per server over a friendly RESTful HTTP API on the network. C++ enables and supports every layer of this software, and that is quite a feat.

Let me briefly digress with another note on versatility. I have this personal hobby project where I am developing a lab power supply for my basement lab (because every basement needs a lab). In order to adjust current and voltage limits, I want to use rotary encoders rather than potentiometers.

A rotary encoder is basically an axle that activates two small switches in a specific sequence and by looking at the sequence you can detect if the user is turning the axle in one direction or the other. The encoder signal gets fed to a 1MHz 8-bit processor with 1 kB of RAM and 8 kB of flash for my code.

To implement the code that detects the turning of these encoders, it makes sense to use a textbook, object-oriented approach. Create a class for an encoder. Define a couple of methods for reading the switches and for reading out the final turn data. Declare a bit of local state. Beautifully encapsulated in pure OO style. The main logic can then instantiate the two encoders and call the methods on these objects. I am implementing the software for this project in C++ as well – try to think about that for a moment: The same language that allows us to efficiently and fully utilize a 32-core 3.4GHz 64-bit processor with 1TiB of RAM and 3PiB of raw disk works ‘just as well’ on a 1-core 1MHz 8-bit processor with 1kiB of RAM and 8kiB of flash storage – and the code looks basically the same.

There are not many languages that can stretch this wide and not show the slightest sign of being close to its limit. This is truly something to behold.

The rest of the stack

The storage service exposes a simple RESTful API over HTTP using an HTTP stack we implemented from scratch in C++. Instantiating a web server in C++ is a single line of code – processing requests is as trivial as one could wish for.

I’ve heard plenty of arguments that doing HTTP or XML or other ‘web’ technology work would be simpler in Java or C# or other newer languages, but really, if you write your code well, why would this be difficult? Why would you spend more than a line of code to instantiate a web server? Why would parsing an XML document be difficult?

For XML, we implemented a validating parser using C++ metaprogramming; I have to be honest and say this was not fun all the way through and I couldn’t sit down and write another today without reading up on this significantly first. C++ metaprogramming is nothing like a proper macro system – but it can absolutely solve a lot of problems, including giving us an RNC-like schema syntax for declaring a validating XML parser and generating efficient code for exactly that parser.

This also means when we parse an XML document and we declare that one of the elements is an integer, then either it parses an integer successfully or it throws. If we declare a string, we get the string properly decoded so that we always work on the native data – we cannot ever forget to validate a value and we cannot ever forget to escape or un-escape data. By creating a proper XML parser using the language well, we have not only made our life simpler, we have also made it safer.

The entire software ecosystem at Keepit may revolve around our storage systems, but we have several other supporting systems that use our shared components for the HTTP and XML stack.

One other notable C++ system is our search engine. Like so many other companies, we found ourselves needing a search engine to assist us with providing an amazing end user experience when browsing their datasets. And like so many others we fired up a cluster of Elasticsearch servers and went to work.

Very quickly we got hit by this basic fact that Elastic is great at queries and not very good at updates – and we have many more updates than we have queries. We simply couldn’t get this to scale like we’re used to. What to do?

While struggling with Elastic, we started the ‘Plan-B’ project to create a simple search engine from scratch – this engine has been our only search engine for years now and to this day, the process is still called ‘bsearch.’

Our search engine offers a google-like matching so that you can find your documents even if you misspell them, and it is a piece of technology that we are quite actively developing both to improve matching capabilities across languages and to allow for more efficient processing of much larger datasets, which will open up for other uses in the future.

Of our backend code base, about 81% of our code is C++. Another 16% is Common Lisp. The remaining 3% is Java.

We use Common Lisp in two major areas: For ‘general purpose’ business functions such as messaging, resource accounting, billing, statistical data processing, etc. And we use it for backup dataset processing. These are two very different uses.

The first is a more classical application of the language where performance is maybe less of a concern but where the unparalleled power of the language allows for beautiful implementations of otherwise tedious programs.

The second use is a less traditional use case where enormous datasets are processed and where the majority of the memory is actually allocated and managed outside of the garbage collector – it is truly a high-performance Lisp system where we benefit from the power of the language to do interesting and efficient extractions of certain key data from the hundreds of petabytes of customer data that pass through our systems.

Many people don’t know Common Lisp and may propose that ‘Surely nobody would write a web service in Common Lisp.’ Well, as with all other languages you need to understand the language to offer useful criticism; and the really groundbreaking feature of Common Lisp is its macro system. It is what makes Common Lisp by far the most powerful language in existence by a large margin.

This is nothing like C pre-processor macros; the Common Lisp macro system allows you to use the full power of the language to generate code for the compiler. Effectively, this means the language is fully programmable. This is not something that is simple to understand since there is no meaningful way to do this using C-like language syntax, which is also why the Lisp dialects have a syntax that is fundamentally different from other languages.

In other words, if you do not understand the Lisp syntax, you are not equipped to comprehend what the macro system allows. This is not simple to wrap your head around, but, for example, I can mention that Common Lisp was the first general purpose programming language to get Object Orientation added to it, and this was done not with a change to the language and the compiler, but with a library that contained some macros. Imagine that.

Fortran allows you to implicitly declare the type of variables by using certain letters in the first character of the variable name – just for fun, I implemented that with a macro for Common Lisp. If I wanted to do that with C or C++ or any other language, I would need to extend the compiler.

The idea of using the first character in the name of the variable to implicitly declare its type is of course ridiculous, but there are many little syntactical shortcuts or constructs that can help you in daily life that you may wish was present in your language of choice which you can only hope the language steering committee may one day add to the standard.

With Common Lisp, this is everyday stuff – no need to wait. If you want a new type of control structure or declaration mechanism, just go ahead and build it. The power of this cannot be overstated. C++ metaprogramming (and go generics and everything else) pales in comparison, useful as it is.

Lessons learned

First of all, it really sucks to have multiple languages; you can’t expect everyone to be an expert in all, so by having more than one language, you decimate the effective size of your team. However, we picked Common Lisp to replace a sprawling forest of little scripts done in more languages than I could shake a stick at—meaning we are fortunate to have only two languages on our backend.

C++ and Common Lisp are so different that they complement each other well. Yes, we could have done everything in C++, but there are problems we solve in Common Lisp which would have been much less enjoyable to solve in C++. Now on the downside, we have two HTTP stacks, two XML stacks, two database libraries, two connection pools, and so on and so forth. There is no simple perfect solution here; the compromise we have arrived at is indeed working out very well for us.

We’ve been told many times that recruiting for C++ is hard because recruiting for ‘web technologies’ is so much simpler. Well guess what, finding good JavaScript developers is just as hard as finding good C++ developers in my experience. With Common Lisp it’s different again: it’s harder to find people, but the percentage of the candidates that are actually qualified is higher, so all in all, it’s actually fine. Recruitment is difficult across languages, period.

The best you can do is go to a conference, talk about your tech, and hope that some developers show up at your booth to talk about employment.

Old grumpy man’s advice for youngsters considering a career in software engineering

First of all, seriously consider a computer science education. There exist amazingly qualified people who do not have this and some of them work for us, but in my experience most really good developers have this. It certainly helps to get a foundation of mathematics, logic, and basic computer science. Knowing why things work will make learning new things infinitely simpler.

Learn multiple, properly different programming languages and write actual code in them. You need to experience (by failing) how functions are useful as abstractions and how terrible it is to work with ill-designed abstractions. You need to fail and spend serious time failing.

Make sure one of those languages is a compiled language with little or no runtime: C, C++, Rust, or even Fortran for that matter (not sure Fortran has much long-term perspective left in it though – it’s probably time to say goodbye). Now challenge yourself to write the most efficient implementation of some simple problem – maybe a matrix multiplication for example.

Disassemble the code and look at it. At least get some understanding of the processor instructions and why they are generated from the code you wrote. Learn how cache lines matter. Time your code and find out why your solution isn’t faster than it is. Then make it faster until you can prove to yourself that your instructions pipeline as much as they can, your cache misses are minimal, you don’t wait on register write delays and so on and so forth.

Also, make sure that one of those languages is Common Lisp. It should be a criminal offence for a university to not teach Common Lisp in their computer science curriculum. Read ‘The Structure and Interpretation of Computer Programs – SICP’ too. Even if you will never use Lisp again, knowing it will make you a better developer in any other language.

And finally, as much as I dislike JavaScript, you should learn that, too. The most beautiful backend code will too easily be ignored if you cannot beautifully present its results – and today this means doing something with JavaScript.

Aside from my previous criticisms, you can make working with JavaScript more bearable, for example, by creating your own framework rather than relying on the constantly changing megabyte sized atrocities that your common web projects rely on. However, this is probably a topic for future discussion.

About Version 2 Digital

Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.

Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About Keepit
At Keepit, we believe in a digital future where all software is delivered as a service. Keepit’s mission is to protect data in the cloud Keepit is a software company specializing in Cloud-to-Cloud data backup and recovery. Deriving from +20 year experience in building best-in-class data protection and hosting services, Keepit is pioneering the way to secure and protect cloud data at scale.

×

Hello!

Click one of our contacts below to chat on WhatsApp

×