Constructing URLs
In this post, we are going to discuss a URL-constructor function.
More specifically, we are going to highlight a common pitfall encountered when constructing strings for URLs.
Table of Contents
Building URLs with f-strings
If I asked you to create a set of child path URLs from a base URL, then chances you are, your mind has probably wandered to f-strings.
Let’s take our base url as the following:
base_url = "https://www.ourawesomesite.com"
Now let’s say we want this string to be concatenated as a forward path to the base_url:
path_for_blog = "/blog"
If we take the f-string approach, we might be tempted to write something like this:
def join_urls(base_url: str, suffix_url: str) -> str:
return f"{base_url}{suffix_url}"
You might be thinking, well this approach is fine.
What is the point of this blog post.
Why am I still reading this?
However, this approach is brittle and is prone to a very specific bug.
In the case in which someone calls our function with the suffix_url argument as "blog" instead of "/blog",
the outcome will not be quite we wanted:
>>> join_urls(base_url=base_url, suffix_url="blog")
'https://www.ourawesomesite.comblog'
See the difference?
The string that we got back is not what we were expecting right.
A missing / character from a URL can be deceptively opaque and tricky to spot!
With our current implementation of join_urls()
it can be said that we are implicitly expecting the caller of our function to be aware of this nuance.
Using the urllib library
So we have a couple of options at this point:
- We can warn other developer of this in the docstring to this function.
- We can make the string concatenation part much more robust.
I’m going to skip past a design discussion and select option 2.
To achieve this, we will make use of the urllib library.
from urllib.parse import urljoin
def join_urls(base_url: str, suffix_url: str) -> str:
return urljoin(base=base_url, url=suffix_url)
Now our join_urls() is more robust, we can pass a string to the suffix_url arg
without the preceding / character
>>> join_urls(base_url, "blog")
'https://www.ourawesomesite.com/blog'
And we can also still use the same implementation for strings with the preceding / character:
>>> join_urls(base_url, "blog")
'https://www.ourawesomesite.com/blog'
Assuming we had tests for this function, we should also find that in making this change, our tests do not break.
This is what is known as reaching feature parity.
This is when we can be sure that making changes to some part of our system
has not broken existing contracts/ expected behaviours.