Skip to content

Why business continuity belongs in the cloud?

Resilience in today’s liquid business environment demands flexibility. The term “observability” replaces monitoring, reflecting the need to adapt and be agile in the face of challenges. The key is to dissolve operations into the cloud, integrating tools and operational expertise for effective resilience.

I remember that when I started my professional career (in a bank) one of the first tasks I was handled was to secure an email server exposed to the internet. Conversations around coffee were about these new trends that seemed suicidal, they wanted to take away service exploitation to servers on the internet!

There wasn’t even talk of the cloud at that time. The first steps of software as a service were already being taken, but in short, everything was on-premise, and infrastructure was the queen of computing, because without a data center, there was no business.

Two decades have gone by and the same thing that happened to Mainframes has happened to data centers. They are reminiscences of the past, something necessary, but outside of our lives. No one builds business continuity around the concept of the data center anymore, who cares about data centers anymore?

The number of devices worldwide that are connected to each other to collect and analyze data and perform tasks autonomously is projected to nearly triple, from 7 billion in 2020 to over 29.4 billion in 2030.

How many of those devices are located in a known data center?, furthermore, does it really matter where these devices are?

We don’t know who they are, who they are maintained by, or even what country they are in many times, no matter how much data protection laws insist, technology evolves a lot faster than legislation.

The most important challenge is ensuring business continuity, and that task is at the very least difficult when it is increasingly harder to know how to manage a business’ critical infrastructure, because the concept of infrastructure itself is changing.

What does IT infrastructure mean?

The suite of applications that manages the data needed to run your business. Below those applications is “everything else”, from databases, engines, libraries, full technology stacks, operating systems, hardware and hundreds of people in charge of each piece of that great babel tower.

What does business continuity mean?

According to ISO 22301, business continuity is defined as the “ability of an organization to continue the delivery of products and services in acceptable timeframes at a predefined capacity during an interruption.”

In practice, there is talk of disaster recovery and incident management, in a comprehensive approach that establishes a series of activities that an organization can initiate to respond to an incident, recover from the situation and resume business operations at an acceptable level. Generally, these actions have to do with infrastructure in one way or another.

Business continuity today

Before IT was simpler, infrastructure was located in one or more datacenters.

Now, we don’t even know where it is, beyond a series of intentionally fuzzy concepts, but what we do know is that neither the hardware is ours, nor the technology is ours, nor the technicians, nor the networks are ours. Only the data (supposedly).

What does business resilience mean?

It is funny that this term has become trendy, when the basic concept of the creation of the Internet was resilience. It means neither more nor less than that it is not a matter of hitting a wall and getting up, but that of accepting mistakes and moving forward, in other words, being a little more elegant and flexible when facing adversity.

Resilience and business continuity

In these liquid times, where everything flows, you have to be flexible and change the paradigm, that is why there is no longer talk of monitoring but of observability, because that of the all-seeing eye is a bit illusory, there is too much to see. Old models don’t work.

It’s not a scalability problem (or at least it’s not just a scalability problem), it’s a paradigm shift problem.

Let’s solve the problem using the problem

Today all organizations are somehow dissolved in the cloud. They mix their own infrastructure with the cloud, they mix their own technology with the cloud, they mix their own data with the cloud. Why not mix observability with cloud?

I’m not talking about using a SaaS monitoring tool, that would be to continue the previous paradigm, I’m talking about our tool dissolving in the cloud, that our operational knowledge dissolves in the cloud and that the resilience of our organization is based on that, on being in the cloud.

As in the beginnings of the internet, you may cut off a hydra’s head, but the rest keeps biting, and soon, it will grow back.

Being able to do something like this is not about purchasing one or more tools, hiring one or more services, no, that would be staying as usual.

Tip: the F of FMS in Pandora FMS, means Flexible. Find out why.

Resilience, business continuity and cloud

The first step should be to accept that you cannot be in control of everything. Your business is alive, do not try to corset it, manage each element as living parts of a whole. Different clouds, different applications, different work teams, a single technology to unite them all? Isn’t it tempting?

Talk to your teams, they probably have their own opinion on the subject, why not integrate their expertise into a joint solution? The key is not to choose a solution, but a solution of solutions, something that allows you to integrate the different needs, something flexible that you do not need to be in control of, just take a look, just have a complete map, so that whatever happens, you can move forward, that’s what continuity is all about.

Some tips on business continuity, resilience and cloud

Why scale a service instead of managing on-demand items?

A service is useful insofar as it provides customers with the benefits they need from it. It is therefore essential to guarantee its operation and operability.

Sizing a service is important to ensure its profitability and quality. When sizing a service, the amount of resources needed, such as personnel, equipment, and technology, can be determined to meet the demand efficiently and effectively. That way, you will avoid problems such as long waiting times, overwork for staff, low quality of service or loss of customers due to poor attention.

In addition, sizing a service will allow you to anticipate possible peaks in demand and adapt the capacity appropriately to respond satisfactorily to the needs of customers and contribute to their satisfaction. Likewise, it also helps you optimize operating costs and maximize service profitability.

Why find the perfect tool if you already have it in-house?

Integrate your internal solution with other external tools that can enhance its functionality. Before embarking on a never-ending quest, consider what you already have at home. If you have an internal solution that works well for your business, why not make the most of it by integrating it with other external tools?

For example, imagine that you already have an internal customer management (CRM) system that adapts to the specific needs of your company. Have you thought about integrating it with digital marketing tools like HubSpot or Salesforce Marketing Cloud? This integration could take your marketing strategies to the next level, automating processes and optimizing your campaigns in a way you never imagined before.

And if you’re using an internal project management system to keep everything in order, why not consider incorporating online collaboration tools like Trello or Asana? These platforms can complement your existing system with additional features, such as Kanban boards and task tracking, making your team’s life easier and more efficient.

Also, let’s not forget IT service management. If you already have an internal ITSM (IT Service Management) solution, such as Pandora ITSM, why not integrate it with other external tools that can enhance its functionality? Integrating Pandora ITSM with monitoring tools like Pandora FMS can provide a more complete and proactive view of your IT infrastructure, allowing you to identify and solve issues before they impact your services and users.

The key is to make the most of what you already have and further enhance it by integrating it with other tools that can complement it. Have you tried this strategy before? It could be the key to streamlining your operations and taking your business to the next level.

Why force your team to work in a specific way?

Incorporate other equipment and integrate it into your team (it may be easier than you imagine, and much cheaper).

The imposition of a single work method can limit the creativity and productivity of the team. Instead, consider incorporating new teams and work methods, seamlessly integrating them into your organization. Not only can this encourage innovation and collaboration, but it can also result in greater efficiency and cost reduction. Have you explored the option of incorporating new teams and work methods into your organization? Integrating diverse perspectives can be a powerful driver for business growth and success.

Why choose a single cloud if you can integrate several?

The supposed simplicity can be a prison of very high walls, never take your chances on a single supplier or you will depend on it. Use European alternatives to protect yourself from legal and political changes in the future.

Choosing a single cloud provider can offer simplicity in management, but it also carries significant risks, such as over-reliance and vulnerability to legal or political changes. Instead, integrating multiple cloud providers can provide greater flexibility and resilience, thereby reducing the risks associated with relying on a single provider.

Have you considered diversifying your cloud providers to protect your business from potential contingencies? Integrating European alternatives can provide an additional layer of protection and stability in an increasingly complex and changing business environment.

Why choose high availability?

Pandora FMS offers HA on servers, agents and its console for demanding environments ensuring their continuity.

High availability (HA) is a critical component in any company’s infrastructure, especially in environments where service continuity is key. With Pandora FMS, you have the ability to deploy HA to servers, agents, and the console itself, ensuring your systems are always online even in high demand or critical environments.

Imagine a scenario where your system experiences a significant load. In such circumstances, equitable load distribution among several servers becomes crucial. Pandora FMS allows you to make this distribution, which ensures that, in the event of a component failure, the system remains operational without interruptions.

In addition, Pandora FMS modular architecture allows you to work in synergy with other components, assuming the burden of those that may fail. This contributes to creating a fault-resistant infrastructure, where system stability is maintained, even in the face of unforeseen setbacks.

Why centralize if you can distribute?

Choose a flexible tool, such as Pandora FMS.

Centralizing resources may seem like a logical strategy to simplify management, but it can limit the flexibility and resilience of your infrastructure. Instead of locking your assets into a single point of failure, consider distributing your resources strategically to optimize performance and availability across your network.

With Pandora FMS, you have the ability to implement distributed monitoring that adapts to the specific needs of your business. This solution allows you to deploy monitoring agents across multiple locations, providing you with full visibility into your infrastructure in real time, no matter how dispersed it is.

By decentralizing monitoring with Pandora FMS, you may proactively identify and solve issues, thus minimizing downtime and maximizing operational efficiency. Have you considered how distributed monitoring with Pandora FMS can improve the management and control of your infrastructure more effectively and efficiently? Its flexibility and adaptability can offer you a strong and customized solution for your IT monitoring needs.

Contact our sales team, ask for a quote, or solve your doubts about our licenses. Pandora FMS, the integral solution for monitoring and observability.

About Version 2
Version 2 is one of the most dynamic IT companies in Asia. The company develops and distributes IT products for Internet and IP-based networks, including communication systems, Internet software, security, network, and media products. Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About PandoraFMS
Pandora FMS is a flexible monitoring system, capable of monitoring devices, infrastructures, applications, services and business processes.
Of course, one of the things that Pandora FMS can control is the hard disks of your computers.

How to update your PC BIOS

Every computer has its BIOS, short for Basic Input/Output System), which is a firmware installed on the PC’s motherboard. Through the BIOS you may initialize and configure hardware components (CPU, RAM, hard disk, etc.). Let’s say it’s a kind of translator or bridge between computer hardware and software. Its main functions are:

  • Initialize the hardware.
  • Detect and load the bootloader and operating system.
  • Configure multiple parameters of your PC such as boot sequence, time and date, RAM times and CPU voltage.
  • Set up security mechanisms like a password to restrict access to your PC.

Importance of understanding how to access and update the BIOS

Since its main function is to initialize and check that all the hardware components of your PC are working properly, if everything is working correctly, BIOS looks for the operating system on the hard drive or other boot device connected to your PC. However, accessing‌ BIOS may be an unknown process for many users, preventing its update, which can guarantee the performance of the equipment and its security. Later in this blog we will explain how to access BIOS.

Clarification on the non-routine nature of BIOS updates

It is recommended to update BIOS to maintain performance, stability and computer security. Your PC manufacturer can send BIOS updates to add features or fix some bugs. The process is overall simple, but it must be done with great care to avoid irreversible damage. Also, it should be avoided to turn off or cut off the power in the middle of an upgrade process with serious consequences for the equipment.

Accessing the BIOS from Windows

To access BIOS, there are several options, from the following buttons, depending on the brand of your computer:

  • Dell: F2 or F12
  • HP: F10
  • Lenovo: F2, Fn + F2, F1, or Enter followed by F1
  • Asus: F9, F10 or Delete
  • Acer: F2 or Delete
  • Microsoft Surface: Hold the volume up button pressed
  • Samsung/Toshiba/Intel/ASRock/Origin PC: F2
  • MSI/Gigabyte/EVGA/Zotac/BIOStar: Delete

Instructions for accessing the BIOS from Windows 10 or 11 through Settings and the Advanced Start option

Just follow these instructions:

  • Restart your computer and wait for the manufacturer’s logo to appear.
  • Press the key one of the keys mentioned above when viewing the home screen to access the BIOS settings.
  • Once in the BIOS, you may navigate through the different options using the arrow keys on your keyboard.

You may also follow this process in Windows 11:

  • On the login or lock screen, press the Shift key on your keyboard and tap the power button (or click the power option at the bottom right of the login screen). Then choose the Reset option from the menu.
  • When Windows 11 restarts, you will be shown the advanced startup screen (choose an option).
  • Then scroll to Troubleshoot > Advanced Options > UEFI Firmware Settings and click Restart.

Since BIOS configuration can have an impact on the operation of your PC, it is recommended to seek help from a professional.

Alternatives to using the Windows 10 and 11 method if the operating system loads too fast to access BIOS.

An alternative to start Win11 BIOS configuration is from the Settings application. Just follow these three steps:

  • Open Windows 11 Settings
  • Navigate to System > Recovery > Restart now.
  • Before you click Restart Now , save your work.
  • Next, go to Troubleshooting > Advanced Options > UEFI Firmware Configuration and click Restart. (we will talk about UEFI later in this blog post)

Another alternative is to use the Windows Run command:

  • Open up the Run box (by pressing the Windows + R keys).
  • Then type shutdown /r /o , and press Enter . A shortcut is to type shutdown /r/o/f /t 00 and click OK .
  • Then select Troubleshoot > Advanced Options > UEFI Firmware Configuration and click Restart to boot into the system BIOS settings.

By the command line, also:

  • Open CMD, PowerShell or Terminal.
  • Type in shutdown /r /o /f /t 00 o shutdown /r /o and press Enter.
  • Then go to Troubleshooting > Advanced Options > UEFI Firmware Configuration and click Restart to get to the Windows 11 BIOS/UEFI configuration.

A more customized option is by shortcut:

  • Right-click on the Windows 11 desktop and select New > Shortcut.
  • In window Create Shortcut, enter shutdown /r/o /f /t 00 or shutdown /r /o to locate it.
  • Follow the instructions to create a BIOS shortcut.

Once the BIOS configuration shortcut is created, just double-click it and choose Troubleshooting > Advanced Options > UEFI Firmware Configuration and click Restart to boot your PC into the BIOS environment.

What does UEFI stand for?

UEFI (Unified Extensible Firmware Interface) has emerged as the most modern and flexible firmware with new features that go hand in hand with today’s needs for more volume and more speed. UEFI supports larger hard drives and faster boot times.

UEFI advantages:

  • Easy to program since it uses the C programming language. With this programming language you may initialize several devices at once and have much faster booting times.
  • More security, based on Secure Boot mode.
  • Faster, as it can run in 32-bit or 64-bit mode and has more addressable address space than BIOS, resulting in a faster booting process.
  • Make remote support. easier. It allows booting over the network, and may also carry different interfaces in the same firmware. A PC that cannot be booted into the operating system can also be accessed remotely for troubleshooting and maintenance.
  • Safe booting, as you may check the validity of the operating system to prevent or check if any malware tampered with the booting process.
  • More features and ability to add programs. You may also associate drivers (you would no longer have to load them into the operating system), which is a major advantage in agility.
  • Modular, since modifications can be made in parts without affecting the rest.
  • CPU microcode independence.
  • Support for larger storage drives, with up to 128 partitions.

Additionally, UEFI can emulate old BIOSes in case you need to install on old operating systems.

Continued use of the “BIOS” term to refer to UEFI for simplicity

BIOS is still used to initialize and check the hardware components of a computer to ensure proper operation. Also, as we have seen, it allows you to customize PC behavior (which boots first, for example). So BIOS is still helpful in troubleshooting issues that prevent the PC from booting properly.

When should you update your BIOS?

Reasons to perform a BIOS update

Updating the BIOS (or UEFI), as we mentioned before, helps the system work with better performance, in addition to checking and adjusting the installed hardware, which in turn ultimately impacts software operation. It is recommended to update BIOS only if there is a necessary improvement in the new version.
Sometimes, it is necessary to update BIOS so that the motherboard supports the use of a new generation processor or other type of hardware.

Warning about the potential risks of a BIOS update

The recommendation to update BIOS only when it is a necessary part of the possibility that the updating process​ fails, leaving your computer inoperable (!). Another risk is data loss if something fails during the upgrade (a connection outage, power, incomplete process). It considers that there may be unexpected errors that may result in a direct impact on the operation ‌of your computer. That is why it is recommended to ask for professional support to do so.

How to update your BIOS

Although each manufacturer recommends a process and their own tools for updating BIOS, you may say that the first step is always to back up the most critical data on your computer, in case something goes wrong in the process (hopefully not!). To do so, the following is recommended:

Identification of the motherboard model and BIOS using Windows system information

The BIOS update is based on the model data of the motherboard or computer. To find out, press the Windows key on your PC and type System Information . The service window will open in which all the details of the installed software will be listed. You will see the System Model and BIOS Version and Date, for the BIOS manufacturer’s name, BIOS version, and release date. With this data you will know which version of the BIOS to download (it must be later than the one you installed).

However, the most common method of updating BIOS is through an update wizard program, which takes you by the hand throughout the update process and runs from the operating system. Only indicate where the BIOS update file is located and restart the PC.

Steps to download and install the BIOS update according to the manufacturer’s instructions.

Generally, the manufacturer of the motherboard of your PC has not only an update wizard program but also the BIOS update file, such as the wizard program itself, which you may download from the support page of the manufacturer of your computer or motherboard.
Once you obtain the BIOS installation wizard and the latest version of the BIOS, download them to your computer. It is important to mention that it is not recommended to use Beta versions of BIOS updates. It is preferable to keep the latest stable version, even if it is older.
Let the update wizard take you by the hand and use the BIOS update file to indicate that this is the new firmware to be installed. In case the downloaded update file is invalid or more updated to what you already have installed, the wizard software will detect it and will not perform the update.
Once this is done, restart your PC. We recommend that you check the main settings, checking that the date and time are correct, the boot order is correct (i.e. which hard drive is checked first for a Windows installation) and check that everything else is correct.
Now, you may continue working with the new BIOS version.

BIOS Update Considerations

Before making any BIOS updates, it is always recommended to back up the data so that this does not become your nightmare. For BIOS update, please consider these considerations:

  • Updating the BIOS generally does not improve performance, so it should be done only if necessary.
  • As we have seen, there are several methods for updating the BIOS, increasingly intuitive such as those in which the manufacturer itself offers an update wizard program that takes you by the hand throughout the process. It is important to follow the instructions that the manufacturer of your equipment indicates to prevent it from becoming unusable.
  • Always investigate BIOS corruption recovery options and have that information handy. That is: get ready for any contingency. Many times, despite precautionary measures, the upgrade may fail, either due to incompatibility issues or an unfortunate blackout or power outage. Should that happen, and if the PC is still working, do not turn off the computer. Close the flash update tool and restart the update process to see if it works. If you made a BIOS backup dtry selecting this file to recover it.

Also some motherboards have backup BIOSes that help restore the BIOS. Or, the manufacturer sells BIOS chips from its online store, at a good price.
Finally, we would like to repeat once again the recommendation that you rely on an expert to update the BIOS.

Olivia Diaz

Market analyst and writer with +30 years in the IT market for demand generation, ranking and relationships with end customers, as well as corporate communication and industry analysis.

Analista de mercado y escritora con más de 30 años en el mercado TIC en áreas de generación de demanda, posicionamiento y relaciones con usuarios finales, así como comunicación corporativa y análisis de la industria.

Analyste du marché et écrivaine avec plus de 30 ans d’expérience dans le domaine informatique, particulièrement la demande, positionnement et relations avec les utilisateurs finaux, la communication corporative et l’anayse de l’indutrie.

About Version 2
Version 2 is one of the most dynamic IT companies in Asia. The company develops and distributes IT products for Internet and IP-based networks, including communication systems, Internet software, security, network, and media products. Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About PandoraFMS
Pandora FMS is a flexible monitoring system, capable of monitoring devices, infrastructures, applications, services and business processes.
Of course, one of the things that Pandora FMS can control is the hard disks of your computers.

XZ Vulnerability

You drink tap water every day, right? Do you know who invented the filtering mechanism that makes water pure and clean?… Well, do you actually care?

Do you know that this mechanism is exactly the same in all the taps of all the houses of any country? Do you know that this specialized piece is the work of an engineer who does it just because? Can you imagine what could happen if this person had a bad day?

Let’s talk about the XZ Utils library and why it is not a good idea to depend on a single supplier and make them angry. Let’s talk about the XZ Utils library and its latest developer, Jia Tan.

Yes, open source software can offer a series of benefits in terms of prices (because it is actually “free”), transparency, collaboration and adaptability, but it also entails risks regarding the security and excessive trust that we place as users.

What happened?

On March 29, Red Hat, Inc. disclosed the vulnerability CVE-2024-3094, with a score of 10 on the Common Vulnerability Scoring System scale, and, therefore, a critical vulnerability, which compromised the affected SSH servers.

This vulnerability affected the XZ Utils package, which is a set of software tools that provide file compression and decompression using the LZMA/LZMA2 algorithm, and is included in major Linux distributions. Had it not been discovered, it could have been very serious, since it was a malicious backdoor code, which would grant unauthorized remote access to the affected systems through SSH.

The vulnerability began in version 5.6.0 of XZ, and would also affect version 5.6.1.

During the liblzma building process it would retrieve an existing camouflaged test file in the source code, later used to modify specific functions in the liblzma code. The result is a modified liblzma library, which can be used by any software linked to it, intercepting and modifying data interaction with the library.

This process of implementing a backdoor in XZ is the final part of a campaign that was extended over 2 years of operations, mainly of the HUMNIT type (human intelligence) by the user Jia Tan.

User Jia Tan created his Github account in 2021, making their first commit to the XZ repository on February 6, 2022. More recently, on February 16, 2024, a malicious file would be added under the name of “build-to-host.m4” in .gitignore, later incorporated together with the launch of the package, to finally on March 9, 2024 incorporate the hidden backdoor in two test files:

  • tests/files/bad-3-corrupt_lzma2.xz
  • tests/files/good-large_compressed.lzma

How was it detected?

The main person in charge of locating this issue is Andres Freund.

It is one of the most important software engineers at Microsoft, who was performing micro-benchmarking tasks. During testing, they noticed that sshd processes were using an unusual amount of CPU even though the sessions were not established.

After profiling sshd, they saw a lot of CPU time in the liblzma library. This in turn reminded them of a recent bizarre complaint from Valgrind about automated testing in PostgreSQL. This behavior could have been overlooked and not discovered, leading to a large security breach on Debian/Ubuntu SSH servers.

As Andres Freund himself claims, a series of coincidences were required to be able to find this vulnerability, it was a matter of luck to have found it.

What set off Freund’s alarms was a small delay of only 0.5 sec in the ssh connections, which although it seems very little, was what led him to investigate further and find the problem and the potential chaos that it may have generated.

This underscores the importance of monitoring software engineering and security practices. The good news is that, the vulnerability has been found in very early releases of the software, so in the real world it has had virtually no effect, thanks to the quick detection of this malicious code. But it makes us think about what could have happened, if it had not been detected in time. It is not the first nor will be the last. The advantage of Open Source is that this has been made public and the impact can be evaluated, in other cases where there is no such transparency, the impact can be more difficult to evaluate and therefore, remediation.

Reflection

After what happened, we are in the right position to highlight both positive and negative points related to the use of open source.

As positive points we can find transparency and collaboration between developers from all over the world. Having a watchful community, in charge of detecting and reporting possible security threats, and have flexibility and adaptability, since the nature of open source allows adapting and modifying the software according to specific needs.

As for the disadvantages, we find the vulnerability to malicious attacks, as is the case with the action of developers with malicious intentions. Users trust that the software does not contain malicious code, which can lead to a false sense of security. In addition, due to the number of existing contributions and the complexity of the software itself, it can be said that it is very difficult to exhaustively verify the code.

If we add to all of that the existence of libraries maintained by one person or a very small group of people, the risk of single point of failure is greater. In this case, that need or benefit of having more people contributing is what caused the problem.

In conclusion, while open source software can offer us a number of benefits in terms of transparency, collaboration and adaptability, it can also present disadvantages or challenges in terms of the security and trust we place in it as users.ing.

About Version 2
Version 2 is one of the most dynamic IT companies in Asia. The company develops and distributes IT products for Internet and IP-based networks, including communication systems, Internet software, security, network, and media products. Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About PandoraFMS
Pandora FMS is a flexible monitoring system, capable of monitoring devices, infrastructures, applications, services and business processes.
Of course, one of the things that Pandora FMS can control is the hard disks of your computers.

What is alert fatigue and its effect on IT monitoring?

Talking about too many cybersecurity alerts is not talking about the story of Peter and the Wolf and how people end up ignoring false warnings, but about its great impact on security strategies and, above all, on the stress it causes to IT teams, which we know are increasingly reduced and must fulfill multiple tasks in their day to day. 

Alert Fatigue is a phenomenon in which excessive alerts desensitize the people in charge of responding to them, leading to missed or ignored alerts or, worse, delayed responses. IT security operations professionals are prone to this fatigue because systems are overloaded with data and may not classify alerts accurately.

1. Definición de Fatiga de Alertas y su impacto en la seguridad de la organización

Alert fatigue, in addition to overwhelming data to interpret, diverts attention from what is really important. To put it into perspective, deception is one of the oldest war tactics since the ancient Greeks: through deception, the enemy’s attention was diverted by giving the impression that an attack was taking place in one place, causing the enemy to concentrate its resources in that place so that it could attack on a different front. Taking this into an organization, cybercrime can actually cause and leverage IT staff fatigue to find security breaches. This cost could become considerable in business continuity and resource consumption (technology, time and human resources), as indicated by an article by Security Magazine on a survey of 800 IT professionals:

  • 85% percent of information technology (IT) professionals say more than 20% of their cloud security alerts are false positives. The more alerts, the harder it becomes to identify which things are important and which ones are not.
  • 59% of respondents receive more than 500 public cloud security alerts per day. Having to filter alerts wastes valuable time that could be used to fix or even prevent issues.
  • More than 50% of respondents spend more than 20% of their time deciding which alerts need to be addressed first. Alert overload and false positive rates not only contribute to turnover, but also to the loss of critical alerts. 55% say their team overlooked critical alerts in the past due to ineffective prioritization of alerts, often weekly and even daily.

What happens is that the team in charge of reviewing the alerts becomes desensitized. By human nature, when we get a warning of every little thing, we get used to alerts being unimportant, so it is given less and less importance. This means finding the balance: we need to be aware of the state of our environment, but too many alerts can cause more damage than actually help, because they make it difficult to prioritize problems.

2. Causes of Alert Fatigue

Alert Fatigue is due to one or more of these causes:

2.1. False positives

These are situations where a security system mistakenly identifies a benign action or event as a threat or risk. They may be due to several factors, such as outdated threat signatures, poor (or overzealous) security settings, or limitations in detection algorithms.

2.2. Lack of context

Alerts must be interpreted, so if alert notifications do not have the proper context, it can be confusing and difficult to determine the severity of an alert. This leads to delayed responses.

2.3. Several security systems

Consolidation and correlation of alerts are difficult if there are several security systems working at the same time… and this gets worse when the volume of alerts with different levels of complexity grows.

2.4. Lack of filters and customization of cybersecurity alerts

If they are not defined and filtered, it may cause endless non-threatening or irrelevant notifications.

2.5. Unclear security policies and procedures

Poorly defined procedures become very problematic because they contribute to aggravating the problem.

2.6. Shortage of resources

It is not easy to have security professionals who know how to interpret and also manage a high volume of alerts, which leads to late responses.

The above tells us that correct management and alert policies are required, along with the appropriate monitoring tools to support IT staff.

3. Most common false positives

According to the Institute of Data, false positives faced by IT and security teams are:

3.1. False positives about network anomalies

These take place when network monitoring tools identify normal or harmless network activities as suspicious or malicious, such as false alerts for network scans, legitimate file sharing, or background system activities.

3.2. False malware positives

Antivirus software often identifies benign files or applications as potentially malicious. This can happen when a file shares similarities with known malware signatures or displays suspicious behavior. A cybersecurity false positive in this context can result in the blocking or quarantine of legitimate software, causing disruptions to normal operations.

3.3. False positives about user behavior

Security systems that monitor user activities can generate a cybersecurity false positive when an individual’s actions are flagged as abnormal or potentially malicious. Example: an employee who accesses confidential documents after working hours, generating a false positive in cybersecurity, even though it may be legitimate.

False positives can also be found in email security systems. For example, spam filters can misclassify legitimate emails as spam, causing important messages to end up in the spam folder. Can you imagine the impact of a vitally important email ending up in the Spam folder?

4. Consequences of Alert Fatigue

Alert Fatigue has consequences not only on the IT staff themselves but also on the organization:

4.1. False sense of security

Too many alerts can lead the IT team to think they are false positives, leaving out the actions that could be taken.

4.2. Late Response

Too many alerts overwhelm IT teams, preventing them from reacting in time to real and critical risks. This, in turn, causes costly remediation and even the need to allocate more staff to solve the problem that could have been avoided.

4.3. Regulatory non-compliance

Security breaches can lead to fines and penalties for the organization.

4.4. Reputational damage to the organization

A breach of the company’s security gets disclosed (and we’ve seen headlines in the news) and impacts its reputation. This can lead to loss of customer trust… and consequently less revenue.

4.5. IT staff work overload

If the staff in charge of monitoring alerts feel overwhelmed with notifications, they may experience increased job stress. This has been one of the causes of lower productivity and high staff turnover in the IT area.

4.6. Deterioration of morale

Team demotivation can cause them to disengage and become less productive.

5. How to avoid these Alert Fatigue problems?

If alerts are designed before they are implemented, they become useful and efficient alerts, in addition to saving a lot of time and, consequently, reducing alert fatigue.

5.1. Prioritize

The best way to get an effective alert is to use the “less is more” strategy. You have to think about the absolutely essential things first.

  • What equipment is absolutely essential? Hardly anyone needs alerts on test equipment.
  • What is the severity if a certain service does not work properly? High impact services should have the most aggressive alert (level 1, for example).
  • What is the minimum that is needed to determine that a computer, process, or service is not working properly?
    Sometimes it is enough to monitor the connectivity of the device, some other times something more specific is needed, such as the status of a service.

Answering these questions will help us find out what the most important alerts are that we need to act on immediately.

5.2. Avoiding false positives

Sometimes it can be tricky to get alerts to only go off when there really is a problem. Setting thresholds correctly is a big part of the job, but more options are available. Pandora FMS has several tools to help avoid false positives:

Dynamic thresholds

They are very useful for adjusting the thresholds to the actual data. When you enable this feature in a module, Pandora FMS analyzes its data history, and automatically modifies the thresholds to capture data that is out of the ordinary.

  • FF Thresholds: Sometimes the problem is not that you did not correctly define the alerts or thresholds, but that the metrics you use are not entirely reliable. Let’s say we are monitoring the availability of a device, but the connection to the network on which it is located is unstable (for example, a very saturated wireless network). This can cause data packets to be lost or even there are times when a ping fails to connect to the device despite being active and performing its function correctly. For those cases, Pandora FMS has the FF Threshold. By using this option you may configure some “tolerance” to the module before changing state. Thus, for example, the agent will report two consecutive critical data for the module to change into critical status.
  • Use maintenance windows: Pandora FMS allows you to temporarily disable alerting and even event generation of a specific module or agent with the Quiet mode. With maintenance windows (Scheduled downtimes), this can be scheduled so that, for example, alerts do not trigger during X service updates in the early hours of Saturdays.

5.3. Improving alert processes

Once they have made sure that the alerts that are triggered are the necessary ones, and that they will only trigger when something really happens, you may greatly improve the process as follows:

  • Automation: Alerting is not only used to send notifications; it can also be used to automate actions. Let’s imagine that you are monitoring an old service that sometimes becomes saturated, and when that happens, the way to recover it is to just restart it. With Pandora FMS you may configure the alert that monitors that service to try to restart it automatically. To do this, you just need to configure an alert command that, for example, makes an API call to the manager of said service to restart it.
  • Alert escalation: Continuing with the previous example, with alert escalation you may make the first action performed by Pandora FMS, when the alert is triggered, to be the restart of the service. If in the next agent run, the module is still in critical state, you may configure the alert so that, for example, a ticket is created in Pandora ITSM.
  • Alert thresholds: Alerts have an internal counter that indicates when configured actions should be triggered. Just by modifying the threshold of an alert you may go from having several emails a day warning you of the same problem to receiving one every two or three days.

This alert (executed daily) has three actions: at first, it is about restarting the service. If at the next alert execution, the module has not been recovered, an email is sent to the administrator, and if it has not yet been solved, a ticket is created in Pandora ITSM. If the alert remains triggered on the fourth run, a daily message will be sent through Slack to the group of operators.

5.4. Other ways to reduce the number of alerts

  • Cascade Protection is an invaluable tool in setting up efficient alerting, by skipping triggering alerts from devices dependent on a parent device. With basic alerting, if you are monitoring a network that you access through a specific switch and this device has a problem, you will start receiving alerts for each computer on that network that you can no longer access. On the other hand, if you activate cascade protection on the agents of that network (indicating whether they depend on the switch), Pandora FMS will detect that the main equipment is down, and will skip the alert of all dependent equipment until the switch is operational again.
  • Using services can help you not only reduce the number of alerts triggered, but also the number of alerts configured. If you have a cluster of 10 machines, it may not be very efficient to have an alert for each of them. Pandora FMS allows you to group agents and modules into Services, along with hierarchical structures in which you may decide the weight of each element and alert based on the general status.

5.5. Implement an Incident Response Plan

Incident response is the process of preparing for cybersecurity threats, detecting them as they arise, responding to quell them, or mitigating them. Organizations can manage threat intelligence and mitigation through incident response planning. It should be remembered that any organization is at risk of losing money, data, and reputation due to cybersecurity threats.

Incident response requires assembling a team of people from different departments within an organization, including organizational leaders, IT staff, and other areas involved in data control and compliance. The following is recommended:

  • Plan how to analyze data and networks for potential threats and suspicious activity.
  • Decide which incidents should be responded to first.
  • Have a plan for data loss and finances.
  • Comply with all applicable laws.
  • Be prepared to submit data and documentation to the authorities after a violation.

Finally, a timely reminder: incident response became very important starting with GDPR with extremely strict rules on non-compliance reporting. If a specific breach needs to be reported, the company must be aware of it within 72 hours and report what happened to the appropriate authorities. A report of what happened should also be provided and an active plan to mitigate the damage should be presented. If a company does not have a predefined incident response plan, it will not be ready to submit such a report.

The GDPR also requires to know if the organization has adequate security measures in place. Companies can be heavily penalized if they are scrutinized after the breach and officials find that they did not have adequate security.

Conclusion

The high cost to both IT staff (constant turnover, burnout, stress, late decisions, etc.) and the organization (disruption of operations, security breaches and breaches, quite onerous penalties) is clear. While there is no one-size-fits-all solution to prevent over-alerting, we do recommend prioritizing alerts, avoiding false positives (dynamic and FF thresholds, maintenance windows), improving alerting processes, and an incident response plan, along with clear policies and procedures for responding to incidents, to ensure you find the right balance for your organization.

About Version 2
Version 2 is one of the most dynamic IT companies in Asia. The company develops and distributes IT products for Internet and IP-based networks, including communication systems, Internet software, security, network, and media products. Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About PandoraFMS
Pandora FMS is a flexible monitoring system, capable of monitoring devices, infrastructures, applications, services and business processes.
Of course, one of the things that Pandora FMS can control is the hard disks of your computers.

NoSQL Databases: The ultimate Guide

Today, many companies generate and store huge amounts of data. To give you an idea, decades ago, the size of the Internet was measured in Terabytes (TB) and now it is measured in Zettabytes (ZB). 

Relational databases were designed to meet the storage and information management needs of the time. Today we have a new scenario where social networks, IoT devices and Edge Computing generate millions of unstructured and highly variable data. Many modern applications require high performance to provide quick responses to user queries.

In relational DBMSs, an increase in data volume must be accompanied by improvements in hardware capacity. This technological challenge forced companies to look for more flexible and scalable solutions.

NoSQL databases have a distributed architecture that allows them to scale horizontally and handle continuous and fast data flows. This makes them a viable option in high-demand environments such as streaming platforms where data processing takes place in real time.

Given the interest in NoSQL databases in the current context, we believe it is essential to develop a user guide that helps developers understand and effectively use this technology. In this article we aim to clarify some basics about NoSQL, giving practical examples and providing recommendations on implementation and optimization to make the most of its advantages.

NoSQL data modeling

One of the biggest differences between relational and non-relational bases lies in the approach we took to data modeling.

NoSQL databases do not follow a rigid and predefined scheme. This allows developers to freely choose the data model based on the features of the project.

The fundamental goal is to improve query performance, getting rid of the need to structure information in complex tables. Thus, NoSQL supports a wide variety of denormalized data such as JSON documents, key values, columns, and graph relationships.

Each NoSQL database type is optimized for easy access, query, and modification of a specific class of data. The main ones are:

  • Key-value: Redis, Riak or DyamoDB. These are the simplest NoSQL databases. They store the information as if it were a dictionary based on key-value pairs, where each value is associated with a unique key. They were designed to scale quickly ensuring system performance and data availability.
  • Documentary: MongoDB, Couchbase. Data is stored in documents such as JSON, BSON or XML. Some consider them an upper echelon of key-value systems since they allow encapsulating key-value pairs in more complex structures for advanced queries.
  • Column-oriented: BigTable, Cassandra, HBase. Instead of storing data in rows like relational databases do, they do it in columns. These in turn are organized into logically ordered column families in the database. The system is optimized to work with large datasets and distributed workloads.
  • Graph-oriented: Neo4J, InfiniteGraph. They save data as entities and relationships between entities. The entities are called “nodes” and the relationships that bind the nodes are the “edges”. They are perfect for managing data with complex relationships, such as social networks or applications with geospatial location.

NoSQL data storage and partitioning

Instead of making use of a monolithic and expensive architecture where all data is stored on a single server, NoSQL distributes the information on different servers known as “nodes” that join in a network called “cluster“.
This feature allows NoSQL DBMSs to scale horizontally and manage large volumes of data using partitioning techniques.

What is NoSQL database partitioning?

It is a process of breaking up a large database into smaller, easier-to-manage chunks.

It is necessary to clarify that data partitioning is not exclusive to NoSQL. SQL databases also support partitioning, but NoSQL systems have a native function called “auto-sharding” that automatically splits data, balancing the load between servers.

When to partition a NoSQL database?

There are several situations in which it is necessary to partition a NoSQL database:

  • When the server is at the limit of its storage capacity or RAM.
  • When you need to reduce latency. In this case you get to balance the workload on different cluster nodes to improve performance.
  • When you wish to ensure data availability by initiating a replication procedure.

Although partitioning is used in large databases, you should not wait for the data volume to become excessive because in that case it could cause system overload.
Many programmers use AWS or Azure to simplify the process. These platforms offer a wide variety of cloud services that allow developers to skip the tasks related to database administration and focus on writing the code of their applications.

Partitioning techniques

There are different techniques for partitioning a distributed architecture database.

  • Clustering
    It consists of grouping several servers so that they work together as if they were one. In a clustering environment, all nodes in the cluster share the workload to increase system throughput and fault tolerance.
  • Separation of Reads and Writes
    It consists of directing read and write operations to different nodes in the cluster. For example, read operations can be directed to replica servers acting as children to ease the load on the parent node.
  • Sharding
    Data is divided horizontally into smaller chunks called “shards” and distributed across different nodes in the cluster.
    It is the most widely used partitioning technique in databases with distributed architecture due to its scalability and ability to self-balance the system load, avoiding bottlenecks.
  • Consistent Hashing
    It is an algorithm that is used to efficiently allocate data to nodes in a distributed environment.
    The idea of consistent hashes was introduced by David Karger in a research paper published in 1997 and entitled “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web“.
    In this academic work, the “Consistent Hashing” algorithm was proposed for the first time as a solution to balance the workload of servers with distributed databases.
    It is a technique that is used in both partitioning and data replication, since it allows to solve problems common to both processes such as the redistribution of keys and resources when adding or removing nodes in a cluster.

    Nodes are represented in a circular ring and each data is assigned to a node using a hash function. When a new node is added to the system, the data is redistributed between the existing nodes and the new node.
    The hash works as a unique identifier so that when you make a query, you just have to locate that point on the ring.
    An example of a NoSQL database that uses “Consistent Hashing” is DynamoDB, since one of its strengths is incremental scaling, and to achieve this it needs a procedure capable of fractionating data dynamically.

Replication in NoSQL databases

It consists of creating copies of the data on multiple machines. This process seeks to improve database performance by distributing queries among different nodes. At the same time, it ensures that the information will continue to be available, even if the hardware fails.
The two main ways to perform data replication (in addition to the Consistent Hashing that we already mentioned in the previous section) are:

Master-slave server

Writing is made to the primary node and from there data is replicated to secondary nodes.

Peer to peer

All nodes in the cluster have the same hierarchical level and can accept writing. When data is written to one node it spreads to all the others. This ensures availability, but can also lead to inconsistencies if conflict resolution mechanisms are not implemented (for example, if two nodes try to write to the same location at the same time).

CAP theorem and consistency of NoSQL databases.

The CAP theorem was introduced by Professor Eric Brewer of the University of Berkeley in the year 2000. He explains that a distributed database can meet two of these three qualities at the same time:

  • Consistency: All requests after the writing operation get the same value, regardless of where the queries are made.
  • Availability: The database always responds to requests, even if a failure takes place.
  • Partition Tolerance: The system continues to operate even if communication between some nodes is interrupted.

Under this scheme we could choose a DBMS that is consistent and partition tolerant (MongoDB, HBase), available and partition tolerant (DynamoDB, Cassandra), or consistent and available (MySQL), but all three features cannot be preserved at once.
Each development has its requirements and the CAP theorem helps us find the DBMS that best suits your needs. Sometimes it is imperative for data to be consistent at all times (for example, in a stock control system). In these cases, we usually work with a relational database. In NoSQL databases, consistency is not one hundred percent guaranteed, since changes must propagate between all nodes in the cluster.

BASIS and eventual consistency model in NoSQL

BASE is a concept opposed to the ACID properties (atomicity, consistency, isolation, durability) of relational databases. In this approach, we prioritize data availability over immediate consistency, which is especially important in applications that process data in real time.

The BASE acronym means:

  • Basically Available: The database always sends a response, even if it contains errors if readings occur from nodes that did not yet receive the last writing.
  • Soft state: The database may be in an inconsistent state when reading takes place, so you may get different results on different readings.
  • Eventually Consistent: Database consistency is reached once the information has been propagated to all nodes. Up to that point we talk about an eventual consistency.

Even though the BASE approach arose in response to ACID, they are not exclusionary options. In fact, some NoSQL databases like MongoDB offer configurable consistency.

Tree indexing in NoSQL databases. What are the best-known structures?

So far we have seen how data is distributed and replicated in a NoSQL database, but we need to explain how it is structured efficiently to make its search and retrieval easier.
Trees are the most commonly used data structures. They organize nodes hierarchically starting from a root node, which is the first tree node; parent nodes, which are all those nodes that have at least one child; and child nodes, which complete the tree.
The number of levels of a tree determines its height. It is important to consider the final size of the tree and the number of nodes it contains, as this can influence query performance and data recovery time.
There are different tree indexes that you may use in NoSQL databases.

B Trees

They are balanced trees and perfect for distributed systems for their ability to maintain index consistency, although they can also be used in relational databases.
The main feature of B trees is that they can have several child nodes for each parent node, but they always keep their height balanced. This means that they have an identical or very similar number of levels in each tree branch, a particularity that makes it possible to handle insertions and deletions efficiently.
They are widely used in filing systems, where large data sets need to be accessed quickly.

T Trees

They are also balanced trees that can have a maximum of two or three child nodes.
Unlike B-trees, which are designed to make searches on large volumes of data easier, T-trees work best in applications where quick access to sorted data is needed.

AVL Trees

They are binary trees, which means that each parent node can have a maximum of two child nodes.
Another outstanding feature of AVL trees is that they are balanced in height. The self-balancing system serves to ensure that the tree does not grow in an uncontrolled manner, something that could harm the database performance.
They are a good choice for developing applications that require quick queries and logarithmic time insertion and deletion operations.

KD Trees

They are binary, balanced trees that organize data into multiple dimensions. A specific dimension is created at each tree level.
They are used in applications that work with geospatial data or scientific data.

Merkle Trees

They represent a special case of data structures in distributed systems. They are known for their utility in Blockchain to efficiently and securely encrypt data.
A Merkle tree is a type of binary tree that offers a first-rate solution to the data verification problem. Its creator was an American computer scientist and cryptographer named Ralph Merkle in 1979.
Merkle trees have a mathematical structure made up by hashes of several blocks of data that summarize all transactions in a block.

Data is grouped into larger datasets and related to the main nodes until all the data within the system is gathered. As a result, the Merkle Root is obtained.

How is the Merkle Root calculated?

1. The data is divided into blocks of a fixed size.

2. Each data block is subjected to a cryptographic hash function.

3. Hashes are grouped into pairs and a function is again applied to these pairs to generate their corresponding parent hashes until only one hash remains, which is the Merkle root.

The Merkle root is at the top of the tree and is the value that securely represents data integrity. This is because it is strongly related to all datasets and the hash that identifies each of them. Any changes to the original data will alter the Merkle Root. That way, you can make sure that the data has not been modified at any point.
This is why Merkle trees are frequently employed to verify the integrity of data blocks in Blockchain transactions.
NoSQL databases like Cassandra draw on these structures to validate data without sacrificing speed and performance.

Comparison between NoSQL database management systems

From what we have seen so far, NoSQL DBMSs are extraordinarily complex and varied. Each of them can adopt a different data model and present unique storage, consultation and scalability features. This range of options allows developers to select the most appropriate database for their project needs.
Below, we will give as an example two of the most widely used NoSQL DBMSs for the development of scalable and high-performance applications: MongoDB and Apache Cassandra.

MongoDB

It is a documentary DBMS developed by 10gen in 2007. It is open source and has been created in programming languages such as C++, C and JavaScript.

MongoDB is one of the most popular systems for distributed databases. Social networks such as LinkedIn, telecommunications companies such as Telefónica or news media such as the Washington Post use MongoDB.
Here are some of its main features.

  • Database storage with MongoDB: MongoDB stores data in BSON files (binary JSON). Each database consists of a collection of documents. Once MongoDB is installed and Shell is running, you may create the DB just by indicating the name you wish to use. If the database does not already exist, MongoDB will automatically create it when adding the first collection. Similarly, a collection is created automatically when you store a file in it. You just have to add the first document and execute the “insert” statement and MongoDB will create an ID field assigning it an ObjectID value that is unique for each machine at the time the operation is executed.
  • DB Partitioning with MongoDB: MongoDB makes it easy to distribute data across multiple servers using the automatic sharding feature. Data fragmentation takes place at the collection level, distributing documents among the different cluster nodes. To carry out this distribution, a “partition key” defined as a field is used in all collection documents. Data is fragmented into “chunks”, which have a default size of 64 MB and are stored in different shards within the cluster, ensuring that there is a balance. MongoBD monitors continuously chunk distribution among the shard nodes and, if necessary, performs automatic rebalancing to ensure that the workload supported by these nodes is balanced.
  • DB Replication with MongoDB: MongoDB uses a replication system based on the master-slave architecture. The master server can perform writing and reading operations, but slave nodes only perform reads (replica set). Updates are communicated to slave nodes via an operation log called oplog.
  • Database Queries with MongoDB: MongoDB has a powerful API that allows you to access and analyze data in real time, as well as perform ad-hoc queries, that is, direct queries on a database that are not predefined. This gives users the ability to perform custom searches, filter documents, and sort results by specific fields. To carry out these queries, MongoDB uses the “find” method on the desired collection or “findAndModify” to query and update the values of one or more fields simultaneously.
  • DB Consistency with MongoDB: From version 4.0 (the most recent one is 6.0), MongoDB supports ACID transactions at document level. The “snapshot isolation” function provides a consistent view of the data and allows atomic operations to be performed on multiple documents within a single transaction. This feature is especially relevant for NoSQL databases, as it poses solutions to different consistency-related issues, such as concurrent writes or queries that return outdated file versions. In this respect, MongoDB comes very close to the stability of RDMSs.
  • Database indexing with MongoDB: MongoDB uses B trees to index the data stored in its collections. This is a variant of the B trees with index nodes that contain keys and pointers to other nodes. These indexes store the value of a specific field, allowing data recovery and deletion operations to be more efficient.
  • DB Security with MongoDB: MongoDB has a high level of security to ensure the confidentiality of stored data. It has several authentication mechanisms, role-based access configuration, data encryption at rest and the possibility of restricting access to certain IP addresses. In addition, it allows you to audit the activity of the system and keep a record of the operations carried out in the database.

Apache Cassandra

It is a column-oriented DBMS that was developed by Facebook to optimize searches within its platform. One of the creators of Cassandra is computer scientist Avinash Lakshman, who previously worked for Amazon, as part of the group of engineers who developed DynamoDB. For that reason, it does not come as a surprise that it shares some features with this other system.
In 2008 it was launched as an open source project, and in 2010 it became a top-level project of the Apache Foundation. Since then, Cassandra continued to grow to become one of the most popular NoSQL DBMSs.
Although Meta uses other technologies today, Cassandra is still part of its data infrastructure. Other companies that use it are Netflix, Apple or Ebay. In terms of scalability, it is considered one of the best NoSQL databases.

Let’s take a look at some of its key properties:

  • Database storage with Apache Cassandra: Cassandra uses a “Column Family” data model, which is similar to relational databases, but more flexible. It does not refer to a hierarchical structure of columns that contain other columns, but rather to a collection of key-value pairs, where the key identifies a row and the value is a set of columns. It is designed to store large amounts of data and perform more efficient writing and reading operations.
  • DB Partitioning with Apache Cassandra: For data distribution, Cassandra uses a partitioner that distributes data to different cluster nodes. This partitioner uses the algorithm “consistent hashing” to assign a unique partition key to each data row. Data possessing the same partition key will stay together on the same nodes. It also supports virtual nodes (vnodes), which means that the same physical node may have multiple data ranges.
  • DB Replication with Apache Cassandra: Cassandra proposes a replication model based on Peer to peer in which all cluster nodes accept reads and writes. By not relying on a master node to process requests, the chance of a bottleneck occurring is minimal. Nodes communicate with each other and share data using a gossiping protocol.
  • DB Queries with Apache Cassandra: Like MongoDB, Cassandra also supports ad-hoc queries, but these tend to be more efficient if they are based on the primary key. In addition, it has its own query language called CQL (Cassandra Query Language) with a syntax similar to that of SQL, but instead of using joins, it takes its chances on data denormalization.
  • DB Indexation with Apache Cassandra: Cassandra uses secondary indexes to allow efficient queries on columns that are not part of the primary key. These indices may affect individual columns or multiple columns (SSTable Attached Secondary Index). They are created to allow complex range, prefix or text search queries in a large number of columns.
  • DB Coherence with Apache Cassandra: By using Peer to Peer architecture, Cassandra plays with eventual consistency. Data is propagated asynchronously across multiple nodes. This means that, for a short period of time, there may be discrepancies between the different replicas. However, Cassandra also provides mechanisms for setting the consistency level. When a conflict takes place (for example, if the replicas have different versions), use the timestamp and validate the most recent version. In addition, perform automatic repairs to maintain data consistency and integrity if hardware failures or other events that may cause discrepancies between replicas take place.
  • DB Security with Apache Cassandra: To use Cassandra in a safe environment, it is necessary to perform configurations, since many options are not enabled by default. For example, activate the authentication system and set permissions for each user role. In addition, it is critical to encrypt data in transit and at rest. For communication between the nodes and the client, data in transit can be encrypted using SSL/TLS.

Challenges in managing NoSQL databases. How does Pandora FMS help?

NoSQL DBMSs offer developers the ability to manage large volumes of data and scale horizontally by adding multiple nodes to a cluster.
To manage these distributed infrastructures, it is necessary to master different data partitioning and replication techniques (for example, we have seen that MongoDB uses a master-slave architecture, while Cassandra prioritizes availability with the Peer to peermodel).
Unlike RDMS, which share many similarities, in NoSQL databases there is no common paradigm and each system has its own APIs, languages and a different implementation, so getting used to working with each of them can be a real challenge.
Considering that monitoring is a fundamental component for managing any database, we must be pragmatic and rely on those resources that make our lives easier.
Both MongoDB and Apache Cassandra have commands that return system status information and allow problems to be diagnosed before they become critical failures. Another possibility is to use Pandora FMS software to simplify the whole process.

How to do so?

If this is a database in MongoDB, download Pandora FMS plugin for MongoDB. This plugin uses the mongostat command to collect basic information about system performance. Once the relevant metrics are obtained, they are sent to Pandora FMS data server for their analysis.
On the other hand, if the database works with Apache Cassandra, download the corresponding plugin for this system. This plugin obtains the information by internally running the tool nodetool, which is already included in the standard Cassandra installation, and offers a wide range of commands to monitor server status. Once the results are analyzed, the plugin structures the data in XML format and sends it to Pandora FMS server for further analysis and display.
For these plugins to work properly, copy the files to the plugin directory of Pandora FMS agent, edit the configuration file and, finally, restart the system (the linked articles explain the procedure very well).
Once the plugins are active, you will be able to monitor the activity of the cluster nodes in a graph view and receive alerts should any failures take place. These and other automation options help us save considerable time and resources in maintaining NoSQL databases.

Create a free account and discover all Pandora FMS utilities to boost your digital project!

And if you have doubts about the difference between NoSQL and SQL you can consult our post “NoSQL vs SQL: main differences and when to choose each of them“.

About Version 2
Version 2 is one of the most dynamic IT companies in Asia. The company develops and distributes IT products for Internet and IP-based networks, including communication systems, Internet software, security, network, and media products. Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

About PandoraFMS
Pandora FMS is a flexible monitoring system, capable of monitoring devices, infrastructures, applications, services and business processes.
Of course, one of the things that Pandora FMS can control is the hard disks of your computers.

×

Hello!

Click one of our contacts below to chat on WhatsApp

×