Dependency Injection in Python
Like most things in software engineering, the term dependency injection sounds a whole lot fancier than the thing it actually represents.
In this post, I will walk through what this looks like in practice and why it is a useful technique to apply.
Table of Contents
Defining dependency injection
Dependency injection can be defined as:
A design pattern whereby external dependencies are explicitly passed to a software component.
There is a slight misconception that this can only be achieved via a framework.
That is not true. There are a few main ways in which this can be done:
- By passing dependencies to a method, function or class via their arguments.
- With a DI framework like
dependency_injector
orinject
in Python to bootstrap larger bits of configuration. - Alternatively, even with your own bootstrap script.
For the purposes of this post, we are going to focus on the first option. As it is probably the most common approach and one which will cover the bases we need.
Coupling and cohesion
We can’t go any further without framing what we mean when we talk about coupling and cohesion.
Coupling describes how closely a set of components are dependent on each other and how closely they are connected.
Whereas cohesion relates to the degree in which components belong together.
We often talk about coupling as the devil, and something which we should strive to eliminate. And to an extent, that is a valiant cause. But it misses a slight nuance. Without any coupling, our system would be standalone components which do not interact with each other at all. I’m sure you’ll agree, that doesn’t sound all that useful!
We should however, strive to reduce and mitigate coupling throughout our systems. So that we can develop components independently of each other.
An example of tight coupling (and no DI)
For the purposes of this demo, let’s pretend we have a UserInterface
class.
It has an implicit dependency on the UserRepository
.
What kind of DB this is, is really not important.
We can just assume that talks to a real database.
Let us also assume that UserRepository
manages connections to that database.
class UserNotAuthenticatedError(Exception):
...
class UserInterface:
def __init__(self):
self.repository = UserRepository()
def get_user(self, user_id):
user = self.repository.get_user(user_id=user_id)
if not user.is_authenticated:
raise UserNotAuthenticatedError()
# do some extra stuff
Note that this is not good code to begin with.
We have no type hints, docstrings and some imports are missing. But let’s excuse these for now and move on!
The is_authenticated
field also should probably have been wrapped into the database query.
For example, if we were using the Django ORM. We might apply the following query:
User.objects.filter(is_authenticated=False, pk=user_id).exists
(The SQLAlchemy version of this would be fairly similar)
Or even the following, by leveraging the User.DoesNotExist
exception which would be thrown if there was no result:
User.objects.get(pk=user_id, is_authenticated=False)
But we’re getting off track here, let’s move on!
It can be said that our
UserInterface
class is tightly coupled to theUserRepository
and to the database itself.
It depends on UserRepository
to retrieve the User
record to fulfill its role.
Starting with a test
To guide our demonstration we will start with writing a unit test. We want a test to ensure that we can capture what happens when the user we retrieve is not authenticated.
In the case of a user not being authenticated when we fetch that user by its ID,
we expect to a UserNotAutenticateError
to be thrown:
import pytest
def test_get_user_raises_error_when_user_is_not_authenticated():
"""
Given an ID for a user who is not authenticated
When `get_user` is called from an instance of the `UserInterface`
Then a `UserNotAuthenticatedError` is raised
"""
# Given
user_id = "fake-user-id"
user_interface = UserInterface()
# When / Then
with pytest.raises(UserNotAuthenticatedError):
user_interface.get_user(user_id=user_id)
If we were to run this test we’d get an error immediately. But not the error we’re looking for.
The compiler will complain that we need a database to be available
for the underlying UserRepository
object to connect to.
I love to sketch things out, so humour me:
This is how our code looks in diagram form.
The UserInterface
class talks directly to the UserRepository
class, both of which sit within our application.
But the key thing to note here is that the components within our application have to talk to the database to do what they need.
Within the __init__
of the UserInterface
we instantiate a UserRepository
.
So before we can do anything, like calling the create_user()
method from the UserInterface
,
we are forced to connect to the database.
Right now, there’s no way around that without patching it. Which in itself would be quite the code smell!
We can safely conclude that we don’t have the sort of control over this as we’d like.
What should our test care about?
For our test, we have to ask ourselves, What do we really care about?
Our test is trying to check that this User
object is not authenticated
by way of the UserNotAuthenticatedError
being thrown.
Simple enough right?
But if we continue with our current approach,
we are saying that a User
is a concept which can only exist if it is backed by a database record.
Our test really only cares about the behaviour around the User
object.
The database feels like extra baggage that we’re just having to put up with to get what we want.
And that baggage will get pretty heavy across the board of any kind of codebase.
If we wanted to write more tests to check for behaviour throughout the create_user()
method,
we would be further enforcing the need for the database to be available for the test runner.
But our test does not really care about the database. Let’s say we used some fake in-memory repository, or an in-memory MySQL database, or a NoSQL database, or even some other form of persistence.
We would still want our test to pass.
So if we want our test to pass, regardless of which form of persistence sits behind our
UserRepository
class. Then we can safely say that we don’t care about the database.
Injecting the components
Right now we’re forcing the UserRepository
onto the users of our UserInterface
.
The main user being our test of course.
So what would this look like if we explicitly gave the repository to the UserInterface
:
class UserInterface:
def __init__(self, repository = UserRepository):
self.repository = repository()
def get_user(self, user_id):
user = self.repository.get_user(user_id=user_id)
if not user.is_authenticated:
raise UserNotAuthenticatedError()
# do some extra stuff
In this case, we are explicitly giving the UserRepository
to UserInterface
via the arguments to the class.
Note that we did not initialize the UserRepository
first, like so:
class UserInterface:
def __init__(self, repository = UserRepository()):
self.repository = repository
The reason for this, is that an instance of the UserRepository
would be created at the point at which the UserInterface
class is defined.
So we would always pay the penalty that comes with the initialization of UserRepository
Alternatively, and perhaps the pattern I would be more inclined to recommend,
we would ask the callers of our UserInterface
to explicitly provide it with a repository
with no default:
class UserInterface:
def __init__(self, repository):
self.repository = repository
def get_user(self, user_id):
user = self.repository.get_user(user_id=user_id)
if not user.is_authenticated:
raise UserNotAuthenticatedError()
# do some extra stuff
The caller of our UserInterface
would simply need to provide an object which implements a get_user()
method,
which takes a key word argument of user_id
and returns an object which implements an is_authenticated
attribute.
Either way, we have now added a control point to our UserInterface
in the form of the repository
argument that we can now manipulate.
I’m sure you’ll agree, that in terms of actual code changes, we had very little to do. And we have brought about quite a significant improvement to our design.
This control point opens the door for us to have more say on what is being put into the UserInterface
.
Getting feedback from our test
Now that we’ve refactored our code, let’s get some feedback from our test:
FAILED [100%]
test_user_interface.py:4 (test_get_user_raises_error_when_user_is_not_authenticated)
def test_get_user_raises_error_when_user_is_not_authenticated():
"""
Given an ID for a user who is not authenticated
When `get_user` is called from an instance of the `UserInterface`
Then a `UserNotAuthenticatedError` is raised
"""
# Given
user_id = "fake-user-id"
> user_interface = UserInterface()
E TypeError: __init__() missing 1 required positional argument: 'repository'
test_user_interface.py:13: TypeError
As we might have expected, our test is telling us that we need to give something to the repository
argument
to our UserInterface
.
Providing a fake for the dependency
The downside to this approach is that we now have some boilerplate code to write.
We need to provide something else to the repository
that we know will be
fast, reliable and will not be interacting with the actual database:
class FakeUser:
def __init__(self, user_id: str, is_authenticated: bool):
self.user_id = user_id
self.is_authenticated = is_authenticated
class FakeUserRepository:
def __init__(self, users: list[FakeUser]):
self.users = users
def get_user(self, user_id: str) -> FakeUser:
return next(user for user in self.users if user.user_id == user_id)
With the FakeUserRepository
we had to mimic some of the behaviour of the original UserRepository
.
But the key thing here is, we want to test the behaviour of the UserInterface
only.
The other thing to note here is the get_user()
method on the FakeUserRepository
will raise a StopIteration
exception if no user
is found.
We are not testing this for now, so we will choose not to implement it as we don’t have a use case for it yet.
Now, we can break the implicit dependency on the database by swapping the real UserRepository
in favour of the FakeUserRepository
.
At first, this can seem a little strange.
You might be thinking that we are mimicking our production code and testing that instead.
But the key question to ask is, what does our test cares about?
The unit test we wrote earlier only cares about the behaviour of the UserInterface
, not the UserRepository
.
If we allow ourselves to exert more control over the UserInterface
, it becomes purer.
If we can control our inputs, then we can be confident that we can bring about the conditions that we are trying to test.
If we use our fakes, we can also move the database out of scope for the test entirely.
Using fakes in our test
Pushing our fakes into our test gives us something like the following:
def test_get_user_raises_error_when_user_is_not_authenticated():
"""
Given an ID for a user who is not authenticated
When `get_user` is called from an instance of the `UserInterface`
Then a `UserNotAuthenticatedError` is raised
"""
# Given
user_id = "fake-user-id"
fake_unauthenticated_user = FakeUser(user_id=user_id, is_authenticated=False)
fake_user_repository = FakeUserRepository(users=[fake_unauthenticated_user])
user_interface = UserInterface(repository=fake_user_repository)
# When / Then
with pytest.raises(UserNotAuthenticatedError):
user_interface.get_user(user_id=user_id)
We could have created some reusable pytest
fixtures to reduce the body of the test.
But that’s a separate post entirely!
Running our test again:
============================= test session starts ==============================
collecting ... collected 1 item
test_user_interface.py::test_get_user_raises_error_when_user_is_not_authenticated PASSED [100%]
============================== 1 passed in 0.02s ===============================
Process finished with exit code 0
We’ve now managed to verify the correct behaviour. And we also restricted the scope of our unit test to the specific component we wanted.
That last point is important, we want the scope of our unit tests to be small and focused. The more components our tests bring into play, the more chances there are of the test breaking as a result of changes or errors happening elsewhere.
We managed to do this by letting our test guide us and applying dependency injection to our code.
If tomorrow, we decided to change the type of database then the test we wrote here would not need to change. We have reduced the coupling in this case.
This is really important, our unit tests should always be fast, focused and reliable. We want them to test small pieces of behaviour, and we also want the outputs to be deterministic.
If we have changed some component elsewhere (i.e. the database) then we should not be forced to make changes to this test.
Changes to our API
Most people nowadays associate API with an HTTP REST API.
But in this context, we refer to it’s purest form. We are referencing the interface for which the component exposes to its users.
In this context, we have been made to make changes to the API of the UserInterface
class.
Because we removed the implicit dependency of the UserRepository
,
we also changed how the caller creates and interacts with the UserInterface
.
There is a valid argument to be made here. What if we had to inject lots of dependencies to our class?
If we find ourselves having to inject numerous dependencies then this indicates we have a design problem.
People tend to argue that dependency injection leads to bloated APIs which expose too many parameters to the user. In this case, we should heed the warnings that our code is trying to give us.
We may not have the right abstractions in place, and that would be the root of the problem.
Summary
Dependency injection is a pattern we can and arguably should use when designing our components.
It allows us to draw lines between those components and have more fine-grained control over our software.
It is a technique we can quite happily use in Python too. Despite the misconceptions that it’s something relevant only to other languages.
For our tests to be reliable, with a given set of inputs, we expect the same output. Every time. Without fail.
To achieve this, we need the high level of control over our components that dependency injection leads us to.
Without this level of control throughout our code, our systems feel impure. Our confidence in our system erodes over time. It becomes much harder to make changes. This sounds dramatic, but the consequences of this are quite severe.
Related posts:
Related articles coming soon:
- Dependency Injection in Go