Test Doubles

February 25, 2023 10 min read

If you’ve ever written tests or read literature around them then you might have come across terminology such as mocks, fakes, spies, stubs and collaborators.

In this article, we will demystify what these things mean along with some more concrete examples on how to use them.

Table of Contents

The definition dilemma

Like most things in software engineering, the terminology around test doubles is unnecessarily confusing and frustratingly there is no universal consensus.

The intention of this article is not to declare those definitions for everyone. I think that ship has long sailed!

Much more well-respected and knowledgeable engineers than myself have tried and failed at that. 😂

The purpose of this article is to outline the definitions and use cases which I have provided for teams in the past. This allowed us to have a framework in place so that we could progress with a unified picture of what and how to use these mechanisms.

These tools allowed us to focus on or sketch collaborators out of our tests. Collaborators being other components which interact with the thing we are trying to test.

Drawing boundaries

Primarily, the reason we need these concepts is so that we can draw boundaries in place when we are writing tests.

We need this ability so that we can constrain the scope of our tests. We want to isolate different components and freeze them so that we can make assertion.

In one way or another, fakes, mocks, stubs and spies provide different mechanisms so that we can isolate part of our code.

They are also known as test doubles, which is more of an all-encompassing terms for these mechanisms.

But there is a considerable amount of confusion out there as to the different scenarios in which we would want to use them.

Example production(ish) code

Let’s say we had an InventoryClient class, which was making calls over network to some other service, which is responsible for keeping track of our inventory:

class InventoryClient:
    def __init__(self):
        ...

    def get_item_by_id(self, item_id: str) -> dict[str, str]:
        response = requests.get(
            url=self.url,
            params={"item_id": item_id},
            headers={"api_key": self.api_key},
        )

        return response.json()["data"]

Now let’s suppose we have a function which takes an InventoryClient, calls the get_item_by_id() method and performs some calculations on the item information to check whether we can afford to buy the item with the constraints of a given budget:

def calculate_item_affordability(
    item_id: str, 
    inventory_client: InventoryClient, 
    budget: int
) -> bool:
    item: dict[str, str] = inventory_client.get_item_by_id(item_id=item_id)

    return check_item_cost_against_budget(cost=item["cost"], budget=budget)

If we were to draw this out, we would probably come up with something like the following:

prod_calculate_item_affordability

The key thing to hone in are steps 2 & 3. In production, we expect to have to make network calls out to our InventoryService via the InventoryClient to bring about the functionality of getting the information associated with a particular item. Following that, our calculate_item_affordability() function can use the item information accordingly.

If we are writing tests against calculate_item_affordability(), how do we handle the fact that our production code is making this network call?

We know that the thing we really care about in this case is the calculations. But we don’t really care about how we got the item information.

Fake

This is where the use of fakes comes into play for us.

A fake is a component which mimics the production version with a working implementation. But it will use some shortcut which simplifies the underlying logic/ calls being made.

The fake should have very limited functionality as it should only be used in testing environments.

We can create a fake implementation of the InventoryClient:

class FakeInventoryClient:
    def __init__(self, fake_items: list[dict[str, Union[str, int]]]):
        self.fake_items = fake_items
        ...
        
    def get_item_by_id(self, item_id: str) -> dict[str, str]:
        return next(fake_item for fake_item in self.fake_items if fake_item["id"] == item_id)

With this in play we can remove the I/O bound network call being made from the equation, and simply keep the items in memory.

This gives us a lot more control when it comes to the components involved here.

fake_calculate_item_affordability

Of course, the caveat here is that we have more boilerplate code to write. We had to write the FakeInventoryClient to mimic the production version. And the initialization steps of the test will require us to create a FakeInventoryClient with fake item information.

def test_calculate_item_affordability_returns_correct_value():
    """
    Given a fake inventory and a budget of 100
    When `calculate_item_affordability()` is called
    Then True is returned
    """
    # Given
    item_id = "abc"
    fake_item = {"id": item_id, "cost": 50}
    fake_inventory_client = FakeInventoryClient(fake_items=[fake_item])
    budget = 100

    # When
    is_item_affordable: bool = calculate_item_affordability(
        item_id=item_id,
        inventory_client=fake_inventory_client,
        budget=budget
    )

    # Then
    assert is_item_affordable is True

This extra boilerplate code can put a lot of people off. But generally, this is not something we should be too worried about doing when writing tests.

In this case the FakeInventoryClient allowed us to exert more control over our test.

We have near total control over our inputs. The time taken for the test to run is negligible. And we have removed additional variable such as the flakiness of the network call, as well as our dependence on the real InventoryService.

Now let’s say our system evolved over time and the check_item_cost_against_budget() now looked at offers & promotional deals from our item information. To accommodate this, we would simply need to make an adjustment to our fake implementation.

This level of control is what we should be aiming for in our systems.

Fakes help us simplify implementations, and they provide a way for us to remove additional variables from our testing equation.

Other typical use cases of when we might want to use fakes are:

Swapping a database out for an in-memory implementation
Using an in-memory store instead of a cloud provider storage solution (like AWS S3 or Google Cloud Storage)

Spy

Spies are objects which can be used to verify how a method or a function is called. We can use them to check the value of the arguments being passed to parameters, and how many times that component was called.

This can be useful when we need to verify the interaction between 2 components.

For example, let’s say we want to check that the get_item_by_id() method is called from the InventoryClient whenever we call calculate_item_affordability(). In essence, this test will check that the input item_id param that we give to the call to calculate_item_affordability() is then passed into get_item_by_id() from the InventoryClient.

In this instance we can pass a spy object where we normally expect an InventoryClient object.

Most of the main languages will provide good utilities for applying spy objects. In Python, we have access to the unittest library which gives us the ever useful Mock and MagicMock classes:

def test_get_item_by_id_is_called_from_inventory_client_with_correct_args(
):
    """
    Given a fake item ID and a spy object to replace the `InventoryClient`
    When `calculate_item_affordability()` is called
    Then `get_item_by_id` is called from the spied `InventoryClient` with the item ID
    """
    # Given
    item_id = "abc"
    inventory_client_spy = unittest.mock.MagicMock() 
    # MagicMock required here due to [] access notation which would otherwise throw a `TypeError`
    
    # When
    calculate_item_affordability(
        item_id=item_id,
        inventory_client=inventory_client_spy,
        budget=100
    )

    # Then
    inventory_client_spy.get_item_by_id.assert_called_once_with(item_id=item_id)

Somewhat confusingly, we are using a mock object as a spy. This is primarily because we have access to a number of helpful spy-like methods including:

Method call	Purpose
`assert_called`	We need to check if the thing was called, irrespective of how many calls.
`assert_called_once`	The same as above, but we are only expecting the 1 call to be made.
`assert_called_with`	We need to check the value of the arguments made to the most recent call.
`assert_called_once_with`	The same as above, but we are only expecting the 1 call to be made.
`assert_not_called`	We want to check that the thing was not called at all.
`assert_any_call`	Checks if the mock was ever called with a given set of arguments.
`assert_has_calls`	Checks against a given mock call list. This is useful when we expect the thing to be called a few times. We can even use the `any_order` flag to help us here.

We can use these methods to check the contract between 2 components.

In other words, we can use them to ensure that we are passing the expected parameters through to the correct places.

This can be handy when we need to ensure that our components are plumbed together correctly.

Stub

A stub is a test double which provides some pre-defined state to calls made during the test. They do not usually respond to anything other than what has been allowed for.

For example, we might want to create a stubbed version of the InventoryClient:

class StubInventoryClient:
    def __init__(self):
        self.get_item_by_id_was_called: bool = False

    def get_item_by_id(self, item_id: str) -> dict[str, Union[str, int]]:
        self.get_item_by_id_was_called = True
        
        return {"id": item_id, "cost": 50}

The difference here of course is that now the get_item_by_id() method on the StubInventoryClient class will always return the same pre-defined state.

Whereas, with a mock object we have more flexibility to change its behaviour on the fly.

We can now use the stub object here to instead guarantee some input state for us:

def test_calculate_item_affordability_returns_correct_value(
):
    """
    Given a `StubInventoryClient` and a budget of 100
    When `calculate_item_affordability()` is called
    Then True is returned
    """
    # Given
    item_id = "abc"
    inventory_client_stub = StubInventoryClient()
    budget = 100

    # When
    is_item_affordable: bool = calculate_item_affordability(
        item_id=item_id,
        inventory_client=inventory_client_stub,
        budget=budget
    )

    # Then
    assert is_item_affordable is True

Arguably, we could have also used the StubInventoryClient to verify that the call to get_item_by_id() was made. We could do this by following a similar methodology as we did with the spy object:

def test_get_item_by_id_is_called_from_stub_inventory_client(
):
    """
    Given a fake item ID and a `StubInventoryClient`
    When `calculate_item_affordability()` is called
    Then `get_item_by_id` is called from the stubbed `StubInventoryClient`
    """
    # Given
    item_id = "abc"
    inventory_client_stub = StubInventoryClient()
    assert inventory_client_stub.get_item_by_id_was_called is False
    
    # When
    calculate_item_affordability(
        item_id=item_id,
        inventory_client=inventory_client_stub,
        budget=100
    )

    # Then
    assert inventory_client_stub.get_item_by_id_was_called is True

Personally, I think this feels too much like the job of a spy.

In my opinion, stubs should look similar to fakes.

The difference between them should be that stubs provide hard-coded state back to the caller. Whereas, fakes might contain some underlying logic, albeit simplified and less expensive than the production implementation.

Mock

Mocks provide lots of the same functionality that we have touched upon throughout this article for the other kinds of test doubles.

We can use them to check on outgoing calls and take a look at the behaviour of the system under test (SUT).

Most languages will have their own mocking frameworks. For example Python has unittest.mock, Java has EasyMock and Go has GoMock.

These libraries can help us stand up mock objects easily. They come bundled with the ability to record calls made on the mock object, as well as allowing us to control how that object responds to given calls.

Summary

We can use test doubles to help us treat collaborators within our tests however we see fit.

Test double	Purpose
Fake	Working implementation with some shortcuts, often design to remove heavy calls from being made.
Spy	Used to record how an object was called along with many times it was called.
Stub	Similar to a fake except it responds with some hard-coded and pre-determined state.
Mock	An amalgamation of some of the above. Often used to verify behaviour of our components.

As mentioned earlier, this is not intended to be a definition that you should use. In reality, it is a set of definitions that I have adopted for my teams in the past.

The key thing is to agree a set of definitions with your team for your codebases and move forward from there.