Scalability - The Basics

October 22, 2022 4 min read

Modern applications experience fluctuating demands based on a variety of factors. Scalability describes the flexibility of a system in the face of ever-changing demand.
In this article, I will unpack some of the core concepts of scalability to give us a starting point for some of the more nitty-gritty details of scaling.

Table of Contents

Vertical scaling

When we talk about scaling there are predominantly 2 main methodologies:

Vertical scaling - Also referred to as scaling up/down. This is the action of increasing the computing power of an existing instance of a workload. With this, we can add more processing capacity to our application in the form CPUs, memory or disk space. scale_up_action_example

Horizontal scaling

Horizontal scaling - Also referred to as scaling out/in. This is the action of increasing the number of instances of a workload. This allows us to handle more traffic with additional instances of our application.

scale_out_action_example

Often you might hear people talk about scaling up when they are actually describing the act of horizontal scaling. As always it is important to set clear definitions to avoid confusion!

Comparing scaling methods

There are pros and cons to either approach.
In general, vertical scaling is easier to implement when compared to horizontal scaling. But the effectiveness of vertical scaling can be limited in comparison.

Vertical Scaling	Horizontal Scaling
Hardware Specification	Limited to the specifications on the available hardware.	Duplication of the main workload means that hardware specification requirements are simple.
Elasticity	The amount of scaling is limited by the largest specification machine available to us.	The upper boundary is much greater and performance can be improved by increasing the number of workloads.
Fault Tolerance	Exposed to a single point of failure, as 1 workload is responsible for a larger portion of computation.	Increased resilience with multiple workloads able to consume demand.
System Complexity	Simple to implement, no additional components are required.	Load balancers will be required and our application must be designed to handle multiple instances of the workload.

One thing we must consider when it comes to vertical scaling is the diminishing marginal returns that we can see from adding more computing power to our workload after an optimal capacity has been reached. This is highly dependent on if for example, code is being executed serially and not asynchronously. This is the theory that once we have reached a particular threshold then the costs of adding more computing power does not provide us with significant performance gains.

We will also want to know the profile of the kind of work that our application is performing.
If our benchmarks tell us that time to request completion is high but our proportion of resource used: resource available is low, then this is probably indicative that our application has blocking operations.

I/O bound

When an application is making requests and waiting for responses, we would categorise this as an I/O bound process. In this case, an application spends more time waiting for inputs and outputs from external components instead of actually performing computation itself.

io_bound_example

In the above example, we spend a small proportion of our time performing some form of computation (the green blocks). Whilst the majority of our time is spent waiting for a response (the grey block).
In this scenario, adding more computing power will only impact the green blocks.

CPU bound

When an application spends more time performing calculations, we would categorise this a CPU bound process. Our bottleneck here is the computing power available to our workload.

cpu_bound_example

In this above example, we spend a much larger proportion of our time performing computation (the green block). In this scenario, adding more computing power will only impact the green block, which is where we spend the largest portion of our execution resides.

You might have guessed where I’m going with this by now!
If our application is spending a large portion of its time performing I/O bound actions, then adding more computational power will be incredibly wasteful.
Therefore it is important for us to assess the kind of work we expect our workloads will be doing.