Drawing Test Boundaries

2022-12-11 12 min read

For us to be able to write small focused unit tests, we need to be able to draw the right boundaries. In this post, I will go over what this looks like conceptually, and then I will walk through a more concrete but somewhat imperfect example.

Table of Contents

Defining a unit test

Unit tests should target small pieces of functionality which can be logically isolated from the surrounding system. They form the largest chunk of tests that we tend to write. Unit tests should be cheap and quick to run. This is the key factor. They allow us to write the sort of tests which give us near-instant feedback, which in turn help drive development.

As a rule of thumb, if our unit tests are in the region of 100s of ms each, then this is a sign that there is too much instantiation happening. This is probably happening due to some I/O based interactions with external APIs or infrastructure. Ideally unit tests should be no more than a few milliseconds.

Any of the following conditions happening without relevant mocking will mean that we should not classify our tests as unit tests:

Making calls from the application layer to database/data access components.
Interacting with other services.
Making requests to external APIs. This includes making calls to clients for external cloud providers (e.g. boto3 for AWS).

As a rule of thumb, calls made over network should be avoided in unit tests.

When should we write unit tests?

A common mistake that lots of engineers can make (including myself earlier in my career) is that we write production code first and then the corresponding unit tests afterwards. Spoiler alert: We are going to make this mistake in this post.

Inherently what this means is we will be more likely to test implementation instead of behaviour. This often produces brittle tests that offer little value.

In my opinion, we gain the most from unit tests when we write them before the actual code à la TDD.

If we consider unit tests as mini-specifications for our code, our tests then drive the behaviour of our code.

We can then write our tests to lock-in the various behaviours that we expect to see.

Our tests can drive the design of the interfaces to our code. And crucially, our tests become focused on what the outcome is, not how we brought about that outcome.

The testing pyramid

Changing gears a little, the testing pyramid is a theory that the number of tests in our testing strategy should be proportionate to the feedback time and the effort taken. If you’ve seen some variation of the test pyramid before, then you will probably know that unit tests form the basis of our testing strategies.

testing_pyramid

Note that the exact terminology is up for debate and our industry hasn’t quite (and probably never will) settled on it.

Although they cannot be relied upon for the entirety of our testing strategy, unit tests do allow us to write tests which should give us quick feedback. This key property of unit tests is what drives us to use them so liberally.

To be able to restrict the scope of our unit tests and keep them focused, we must be able to identify clear boundaries around the unit that we are testing. Once we can clearly define our boundaries, we can draw those lines by adding abstractions and mocking them accordingly in our unit tests if we need.

Depending on abstractions not implementations

Defining those boundaries cannot be achieved if we do not have abstractions in the correct places.

component_a_depends_on_component_b If we had to make a change to Component B, then it is likely we would be forced to make a change to Component A accordingly. These components are said to be highly coupled. As Component A depends on the implementation details of the Component B.

component_a_depends_on_abstraction Abstractions allow us to hide the implementation details of one component from the other. This way, Component A does not need to worry about the nitty gritty details of Component B.

In this scenario, if Component B changes the way it implements something then Component A does not need to know. The unit tests that we can write can be much more focused.

It feels more natural to visualise abstractions when our code conforms to the Single Responsibility Principle. By identifying the primary responsibility of the code we write, we can more easily define where those boundaries should be and delegate secondary responsibilities to other components.

How would we unit test this?

Let’s commit the aforementioned TDD-sin and take a look at the following function first:

import logging

import datadog

logger = logging.getLogger(__name__)

def check_item_stock(item_id: str) -> Dict[str, Union[str, bool]]:
    response = requests.get(url="https://www.someinternalservice/items", json={"item_id": item_id})
    response_data = response.json()
    stock_quantity = response_data["stock_quantity"]
    is_item_in_stock = stock_quantity > 0

    logger.info(f"Item id: {item_id} stock check requested")

    datadog.statsd.increment(metric="ItemStockCheck", value=is_item_in_stock, tags=[f"item: {item_id}"])
    
    return {"item_id": item_id, "stock_quantity": stock_quantity}

The reader has a lot to unpack in this function. They have to go through the function line by line and decide for themselves where the various responsibilities are. Primarily, this is because we have not included the right abstractions for them. We are exposing too much to both the reader and to the check_item_stock() function.

But if we squint we can see the following things happening:

Send a GET request to "https://www.someinternalservice/items" with the ID of the item.
Log out the fact that we’ve requested a stock check.
Call out to Datadog to increment the counter associated with the metric "ItemStockCheck".
Return a dict containing the relevant information about the item.

This is difficult to read, and we are imposing quite the cognitive load on our readers. As well as the fact that this function has some pretty significant side effects with I/O bound actions being performed. The caller of our function is likely to be completely unaware of these!

Writing our first test

Let’s start with writing a test, which we will use to guide us to our destination.

def test_check_item_stock_returns_correct_info():
    """
    Given an ID for an item
    When `check_item_stock()` is called
    Then a dict is returned containing information about the item's current stock quantity
    """
    # Given
    item_id = "some-fake-item-id"

    # When
    item_stock_info = check_item_stock(item_id=item_id)

    # Then
    assert item_stock_info["item_id"] == item_id
    assert item_stock_info["stock_quantity"] == 2

If we run this, then we will make a real request out to "https://www.someinternalservice/items" as well as to datadog. Note that the last assertion will also fail as we have not stubbed out the fetching of stock_quantity.

dependent_on_implementations

Another big flaw with our original implementation is that our check_item_stock() function has to know about the details of how we fetch the item information.

If we decided that we need to change how this is done then the blast radius is quite big since check_item_stock() must also be changed. Having this abstraction in place, allows us to draw a boundary between how we fetch the item information and the check_item_stock() function. If the implementation of get_item_info() changed to instead pull that information from a database, then we could protect ourselves (and our unit tests).

Drawing the first boundary

Let’s take a step forward:

def check_item_stock(item_id: str) -> Dict[str, Union[str, bool]]:
    item_info = get_item_info(item_id=item_id)
    
    logger.info(f"Item id: {item_id} stock check requested")
    
    is_item_in_stock = item_info["stock_quantity"] > 0
    datadog.statsd.increment(metric="ItemStockCheck", value=is_item_in_stock, tags=[f"item: {item_id}"])
    
    return {"item_id": item_id, "stock_quantity": stock_quantity}

It’s not perfect, but we’ve done some interesting things here.

get_item_info_abstraction

We’ve moved the responsibility of knowing how to fetch the item details down into get_item_info.

Yikes, we’ve mocked an implementation!

Let’s get back to our test:

from unittest import mock

@mock.patch("some_module.datadog"})
@mock.patch("some_module.get_item_info"})
def test_check_item_stock_returns_correct_info(
    mocked_get_item_info: mock.MagicMock,
    mocked_datadog: mock.MagicMock,
):
    """
    Given an ID for an item
    When `check_item_stock()` is called
    Then a dict is returned containing information about the item's current stock quantity
    """
    # Given
    item_id = "some-fake-item-id"
    stock_quantity = 261
    mocked_get_item_info.return_value = {"stock_quantity": stock_quantity}

    # When
    item_stock_info = check_item_stock(item_id=item_id)

    # Then
    assert item_stock_info["item_id"] == item_id
    assert item_stock_info["stock_quantity"] == stock_quantity

We shouldn’t be afraid of mocking when needed, but we should always listen to our tests. If we feel as though we are having to mock too many dependencies out, or we are mocking implementation details, then we should heed those warnings!

Here, we have found ourselves mocking code which belongs to others i.e. the datadog library. This is a good sign that we have not drawn the correct abstraction in place. check_item_stock() is too aware of the fact that we are calling out to datadog.

If our product manager told us tomorrow that we were moving from say Datadog to Prometheus then we’ve made things for ourselves a little more painful than needed.

Boundaries between 3rd party library code

Let’s take another look at our function:

def check_item_stock(item_id: str) -> Dict[str, Union[str, bool]]:
    item_info = get_item_info(item_id=item_id)
    
    logger.info(f"Item id: {item_id} stock check requested")
    
    is_item_in_stock = item_info["stock_quantity"] > 0
    increment_metric(metric="ItemStockCheck", value=is_item_in_stock)
    
    return {"item_id": item_id, "stock_quantity": stock_quantity}

For now, let’s ignore the horrible hardcoded values and somewhat dubious naming.

Wrapping our call to datadog.statsd.increment within the increment_metric() function, helps us tremendously here.

increment_metric_abstraction

With this in place, check_item_stock() now does not actually know whether we used Datadog, Prometheus or some other method of recording that metric. And this ignorance is exactly what we want!

We’ve delegated the responsibility of recording the metric downstream to increment_metric(). Our functions feel purer, they can each be said to be responsible for 1 thing only (Single Responsibility Principle).

The other thing we’ve gained is that we have insulated our code from the 3rd party library, datadog. We can now more easily swap Datadog out in favour of say Prometheus and check_item_stock() does not need to know about the change.

And nor should it, that is an observability concern, not a core domain concern

If an update to our datadog brought about breaking changes. Let’s say increment from the datadog library was changed to increase. In this scenario we would only need to update our new increment_metric() function.

We have protected ourselves with very little effort on our part.

In general, we should always provide abstractions between our code and 3rd party library code.

Mocking abstractions not implementations

With this change, we can modify our test slightly:

from unittest import mock

@mock.patch("some_module.increment_metric"})
@mock.patch("some_module.get_item_info"})
def test_check_item_stock_returns_correct_info(
    mocked_get_item_info: mock.MagicMock,
    mocked_increment_metric: mock.MagicMock,
):
    """
    Given an ID for an item
    When `check_item_stock()` is called
    Then a dict is returned containing information about the item's current stock quantity
    """
    # Given
    item_id = "some-fake-item-id"
    stock_quantity = 261
    mocked_get_item_info.return_value = {"stock_quantity": stock_quantity}

    # When
    item_stock_info = check_item_stock(item_id=item_id)

    # Then
    assert item_stock_info["item_id"] == item_id
    assert item_stock_info["stock_quantity"] == stock_quantity

This feels very clunky, we still had to patch and stub out increment_metric() and get_item_info() to remove any I/O bound actions from running in our tests.

Our test is telling us that our check_item_stock() function demonstrates a lack of modularity.

We had to do a lot of work (the patching) to wrestle the check_item_stock() function under control and isolate the variables associated with those I/O bound calls.

Being made to use patch feels like an admission of failure. Our test is telling us that we could not easily bring our system under test.

However, we have bought ourselves additional readability and cohesion when compared to what we had at the start. We have taken an intermediate step in the right direction.

An imperfect solution

Our solution still feels very imperfect, but crucially we have drawn some suitable lines of abstraction which have pushed us in the right direction.

There is a valid counter-argument to be made here. We could have ~~and should have~~ gone a step further and injected the dependencies of getting the item and recording metrics instead of calling them implicitly within the check_item_stock() function.

One could argue that check_item_stock() is doing too much. And its name does not necessarily describe all of the pieces of functionality it performs. As of right now, there are some side effects which occur when the function is called, the primary one being the call to Datadog.

The interaction with Datadog feels unrelated to the thing we actually care about i.e. the checking of item stock. It is not a core domain concern, it feels somewhat ancillary.

If we continued, we might also have been tempted to instead dispatch an event, and have another component subscribe to that event and increment the metric accordingly.

That would truly separate the checking of the item stock from the observability tooling. Even if it would be at the slight cost of immediate clarity.

Either way, refactoring our check_item_stock() to inject those dependencies would have improved our design.

If we take another step forward and appy dependency injection, we will find that our tests become even easier to write.

But the key thing to note is that we have moved in the right direction.
~~And this post is getting too long, so we’ll talk about dependency injection separately.~~ We have made some valuable incremental improvements even if there are still some noticeable flaws with our approach.

Summary

Our first steps have shown us that by listening to our tests and subsequently drawing the right boundaries, we can drive both our production code and our tests in the right direction from where we started.

Note that we are advocating the mocking of domain abstractions and not of implementation details.

These abstractions also lead us to write code which reads more like a story.

A non-technical person can now look at our check_item_stock() function and have a much better idea of what this piece of code is trying to achieve. This is a really useful mindset to have, as it will ultimately guide us to write better code.

Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.
Grady Booch, author of Object Oriented Analysis and Design with Applications.

In general, we should aim to identify the primary responsibility of a given function/class. We should always ask if it is appropriate to delegate its non-primary responsibilities to other components.

If we can create the right abstractions then it becomes much easier to draw the right boundaries. Unit testing becomes more fluid, whilst the quality of our code will improve.