Dependency Injection in Python

January 6, 2023 12 min read

Like most things in software engineering, the term dependency injection sounds a whole lot fancier than the thing it actually represents.

In this post, I will walk through what this looks like in practice and why it is a useful technique to apply.

Table of Contents

Defining dependency injection

Dependency injection can be defined as:

A design pattern whereby external dependencies are explicitly passed to a software component.

There is a slight misconception that this can only be achieved via a framework.

That is not true. There are a few main ways in which this can be done:

By passing dependencies to a method, function or class via their arguments.
With a DI framework like dependency_injector or inject in Python to bootstrap larger bits of configuration.
Alternatively, even with your own bootstrap script.

For the purposes of this post, we are going to focus on the first option. As it is probably the most common approach and one which will cover the bases we need.

Coupling and cohesion

We can’t go any further without framing what we mean when we talk about coupling and cohesion.

Coupling describes how closely a set of components are dependent on each other and how closely they are connected.

Whereas cohesion relates to the degree in which components belong together.

We often talk about coupling as the devil, and something which we should strive to eliminate. And to an extent, that is a valiant cause. But it misses a slight nuance. Without any coupling, our system would be standalone components which do not interact with each other at all. I’m sure you’ll agree, that doesn’t sound all that useful!

We should however, strive to reduce and mitigate coupling throughout our systems. So that we can develop components independently of each other.

An example of tight coupling (and no DI)

For the purposes of this demo, let’s pretend we have a UserInterface class. It has an implicit dependency on the UserRepository. What kind of DB this is, is really not important. We can just assume that talks to a real database. Let us also assume that UserRepository manages connections to that database.

class UserNotAuthenticatedError(Exception):
    ...


class UserInterface:
    def __init__(self):
        self.repository = UserRepository()

    def get_user(self, user_id):
        user = self.repository.get_user(user_id=user_id)

        if not user.is_authenticated:
            raise UserNotAuthenticatedError()

        # do some extra stuff

Note that this is not good code to begin with.

We have no type hints, docstrings and some imports are missing. But let’s excuse these for now and move on!

The is_authenticated field also should probably have been wrapped into the database query. For example, if we were using the Django ORM. We might apply the following query:

User.objects.filter(is_authenticated=False, pk=user_id).exists

(The SQLAlchemy version of this would be fairly similar)

Or even the following, by leveraging the User.DoesNotExist exception which would be thrown if there was no result:

User.objects.get(pk=user_id, is_authenticated=False)

But we’re getting off track here, let’s move on!

It can be said that our UserInterface class is tightly coupled to the UserRepository and to the database itself.

It depends on UserRepository to retrieve the User record to fulfill its role.

Starting with a test

To guide our demonstration we will start with writing a unit test. We want a test to ensure that we can capture what happens when the user we retrieve is not authenticated.

In the case of a user not being authenticated when we fetch that user by its ID, we expect to a UserNotAutenticateError to be thrown:

import pytest

def test_get_user_raises_error_when_user_is_not_authenticated():
    """
    Given an ID for a user who is not authenticated
    When `get_user` is called from an instance of the `UserInterface`
    Then a `UserNotAuthenticatedError` is raised
    """
    # Given
    user_id = "fake-user-id"
    user_interface = UserInterface()

    # When / Then
    with pytest.raises(UserNotAuthenticatedError):
        user_interface.get_user(user_id=user_id)

If we were to run this test we’d get an error immediately. But not the error we’re looking for.

The compiler will complain that we need a database to be available for the underlying UserRepository object to connect to.

I love to sketch things out, so humour me:

application_coupled_with_db

This is how our code looks in diagram form.

The UserInterface class talks directly to the UserRepository class, both of which sit within our application.

But the key thing to note here is that the components within our application have to talk to the database to do what they need.

Within the __init__ of the UserInterface we instantiate a UserRepository. So before we can do anything, like calling the create_user() method from the UserInterface, we are forced to connect to the database.

Right now, there’s no way around that without patching it. Which in itself would be quite the code smell!

We can safely conclude that we don’t have the sort of control over this as we’d like.

What should our test care about?

For our test, we have to ask ourselves, What do we really care about?

Our test is trying to check that this User object is not authenticated by way of the UserNotAuthenticatedError being thrown. Simple enough right?

But if we continue with our current approach, we are saying that a User is a concept which can only exist if it is backed by a database record.

Our test really only cares about the behaviour around the User object. The database feels like extra baggage that we’re just having to put up with to get what we want.

And that baggage will get pretty heavy across the board of any kind of codebase.

If we wanted to write more tests to check for behaviour throughout the create_user() method, we would be further enforcing the need for the database to be available for the test runner.

But our test does not really care about the database. Let’s say we used some fake in-memory repository, or an in-memory MySQL database, or a NoSQL database, or even some other form of persistence.

We would still want our test to pass.

So if we want our test to pass, regardless of which form of persistence sits behind our UserRepository class. Then we can safely say that we don’t care about the database.

Injecting the components

Right now we’re forcing the UserRepository onto the users of our UserInterface. The main user being our test of course.

So what would this look like if we explicitly gave the repository to the UserInterface:

class UserInterface:
    def __init__(self, repository = UserRepository):
        self.repository = repository()

    def get_user(self, user_id):
        user = self.repository.get_user(user_id=user_id)

        if not user.is_authenticated:
            raise UserNotAuthenticatedError()

        # do some extra stuff

In this case, we are explicitly giving the UserRepository to UserInterface via the arguments to the class. Note that we did not initialize the UserRepository first, like so:

class UserInterface:
    def __init__(self, repository = UserRepository()):
        self.repository = repository

The reason for this, is that an instance of the UserRepository would be created at the point at which the UserInterface class is defined. So we would always pay the penalty that comes with the initialization of UserRepository

Alternatively, and perhaps the pattern I would be more inclined to recommend, we would ask the callers of our UserInterface to explicitly provide it with a repository with no default:

class UserInterface:
    def __init__(self, repository):
        self.repository = repository

    def get_user(self, user_id):
        user = self.repository.get_user(user_id=user_id)

        if not user.is_authenticated:
            raise UserNotAuthenticatedError()

        # do some extra stuff

The caller of our UserInterface would simply need to provide an object which implements a get_user() method, which takes a key word argument of user_id and returns an object which implements an is_authenticated attribute.

Either way, we have now added a control point to our UserInterface in the form of the repository argument that we can now manipulate.

injecting_repository

I’m sure you’ll agree, that in terms of actual code changes, we had very little to do. And we have brought about quite a significant improvement to our design.

This control point opens the door for us to have more say on what is being put into the UserInterface.

Getting feedback from our test

Now that we’ve refactored our code, let’s get some feedback from our test:

FAILED [100%]
test_user_interface.py:4 (test_get_user_raises_error_when_user_is_not_authenticated)
def test_get_user_raises_error_when_user_is_not_authenticated():
        """
        Given an ID for a user who is not authenticated
        When `get_user` is called from an instance of the `UserInterface`
        Then a `UserNotAuthenticatedError` is raised
        """
        # Given
        user_id = "fake-user-id"
>       user_interface = UserInterface()
E       TypeError: __init__() missing 1 required positional argument: 'repository'

test_user_interface.py:13: TypeError

As we might have expected, our test is telling us that we need to give something to the repository argument to our UserInterface.

Providing a fake for the dependency

The downside to this approach is that we now have some boilerplate code to write.

We need to provide something else to the repository that we know will be fast, reliable and will not be interacting with the actual database:

class FakeUser:
    def __init__(self, user_id: str, is_authenticated: bool):
        self.user_id = user_id
        self.is_authenticated = is_authenticated


class FakeUserRepository:
    def __init__(self, users: list[FakeUser]):
        self.users = users

    def get_user(self, user_id: str) -> FakeUser:
        return next(user for user in self.users if user.user_id == user_id)

With the FakeUserRepository we had to mimic some of the behaviour of the original UserRepository. But the key thing here is, we want to test the behaviour of the UserInterface only.

The other thing to note here is the get_user() method on the FakeUserRepository will raise a StopIteration exception if no user is found. We are not testing this for now, so we will choose not to implement it as we don’t have a use case for it yet.

injecting_fake_repository

Now, we can break the implicit dependency on the database by swapping the real UserRepository in favour of the FakeUserRepository. At first, this can seem a little strange. You might be thinking that we are mimicking our production code and testing that instead.

But the key question to ask is, what does our test cares about? The unit test we wrote earlier only cares about the behaviour of the UserInterface, not the UserRepository.

If we allow ourselves to exert more control over the UserInterface, it becomes purer.

If we can control our inputs, then we can be confident that we can bring about the conditions that we are trying to test.

If we use our fakes, we can also move the database out of scope for the test entirely.

Using fakes in our test

Pushing our fakes into our test gives us something like the following:

def test_get_user_raises_error_when_user_is_not_authenticated():
    """
    Given an ID for a user who is not authenticated
    When `get_user` is called from an instance of the `UserInterface`
    Then a `UserNotAuthenticatedError` is raised
    """
    # Given
    user_id = "fake-user-id"
    fake_unauthenticated_user = FakeUser(user_id=user_id, is_authenticated=False)
    fake_user_repository = FakeUserRepository(users=[fake_unauthenticated_user])
    
    user_interface = UserInterface(repository=fake_user_repository)

    # When / Then
    with pytest.raises(UserNotAuthenticatedError):
        user_interface.get_user(user_id=user_id)

We could have created some reusable pytest fixtures to reduce the body of the test. But that’s a separate post entirely!

Running our test again:

============================= test session starts ==============================
collecting ... collected 1 item

test_user_interface.py::test_get_user_raises_error_when_user_is_not_authenticated PASSED [100%]

============================== 1 passed in 0.02s ===============================

Process finished with exit code 0

We’ve now managed to verify the correct behaviour. And we also restricted the scope of our unit test to the specific component we wanted.

That last point is important, we want the scope of our unit tests to be small and focused. The more components our tests bring into play, the more chances there are of the test breaking as a result of changes or errors happening elsewhere.

We managed to do this by letting our test guide us and applying dependency injection to our code.

If tomorrow, we decided to change the type of database then the test we wrote here would not need to change. We have reduced the coupling in this case.

This is really important, our unit tests should always be fast, focused and reliable. We want them to test small pieces of behaviour, and we also want the outputs to be deterministic.

If we have changed some component elsewhere (i.e. the database) then we should not be forced to make changes to this test.

Changes to our API

Most people nowadays associate API with an HTTP REST API.

But in this context, we refer to it’s purest form. We are referencing the interface for which the component exposes to its users.

In this context, we have been made to make changes to the API of the UserInterface class.

Because we removed the implicit dependency of the UserRepository, we also changed how the caller creates and interacts with the UserInterface.

There is a valid argument to be made here. What if we had to inject lots of dependencies to our class?

If we find ourselves having to inject numerous dependencies then this indicates we have a design problem.

People tend to argue that dependency injection leads to bloated APIs which expose too many parameters to the user. In this case, we should heed the warnings that our code is trying to give us.

We may not have the right abstractions in place, and that would be the root of the problem.

Summary

Dependency injection is a pattern we can ~~and arguably should~~ use when designing our components. It allows us to draw lines between those components and have more fine-grained control over our software.

It is a technique we can quite happily use in Python too. Despite the misconceptions that it’s something relevant only to other languages.

For our tests to be reliable, with a given set of inputs, we expect the same output. Every time. Without fail.

To achieve this, we need the high level of control over our components that dependency injection leads us to.

Without this level of control throughout our code, our systems feel impure. Our confidence in our system erodes over time. It becomes much harder to make changes. This sounds dramatic, but the consequences of this are quite severe.