Multi Stage Builds
In this post, we will go through multi-stage builds for Docker images. Why we might want to use them and the implications of them.
Image optimization
When working with Docker images, one thing engineers will often need to do is to reduce the size of the image. The image size has a wide-impacting reach on things like build times, deployments, operational costs & security issues.
There are a few approaches that come to mind:
- At the application level, engineers can look at the project dependencies and remove or migrate away from the larger ones. This is of course, not always an easy win.
- We can use a slimmer base image subject to the kind of dependencies our application may need.
- We can leverage multi-stage images, to reduce the image size.
In this article, we will only be looking at the final suggestion, multi-stage images.
What is a multi-stage build?
A multi-stage build allows us to split the image up into smaller parts. In doing so, we can select and include only the components that we need in the final image.
If we’re clever enough about the way we split the image up, we can also signpost it for different use cases.
For example, we can use the same Dockerfile
for CI/CD processes as well as for production,
all whilst ensuring that we only include the components that we need for each use case.
Example application
Let’s take a very simple Go application.
Our main.go
file looks something like this:
package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello!")
})
fmt.Printf("Running server on port 8080\n")
}
There’s really not a whole lot going on here, our main
function provides the entrypoint to our application.
Which spins up a light handler for inbound HTTP requests to the path /hello/
.
Single stage Dockerfile
In single-stage terms, we would write our Dockerfile
as such:
FROM golang:1.17
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main
ENTRYPOINT ["run", "main.go"]
If we run the following command docker build --tag 'go-example-single-stage' .
we will build the image.
Once this is complete, we can check the image itself.
❯ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
go-example-single-stage latest 9f2fbaaf094b 10 seconds ago 824MB
Our image is already nearly 1GB.
Now let’s take a look at how we could use a multi-stage build to reduce the size of our final image.
Multi stage Dockerfile
# Build stage
FROM golang:1.17 AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main
# Application stage
FROM scratch As application
COPY --from=build /app/main /main
ENTRYPOINT ["run", "main.go"]
With a multi-stage build we can organise our Dockerfile
into discrete sections.
In our case, the first section is a Build stage.
We use this to install project dependencies and build the application.
The second section is representative of our Application stage. This is the image that will be used for running the application.
Take note of the COPY --from=build /app/main /main
layer.
This allows us to bring over only the specific artifacts that we need to run our application.
If we run the following command docker build --tag 'go-example-multi-stage' .
we can build our new image.
Image size optimization
Comparing the images we built earlier:
❯ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
go-example-multi-stage latest e77afead2066 10 seconds ago 3.65MB
go-example-single-stage latest 9f2fbaaf094b 30 seconds ago 824MB
As we can see, building the same image with each approach can result in drastic differences.
Of course, the caveat here is we have been able to build the final stage with the
scratch
image.
Which as you might have guessed, is simply an empty base image.
This particular use case lends itself very nicely to our small Go application.
Summary
It should be noted that different types of applications lend themselves to the multi-stage approach. In our example, we used a Go application which will tend to produce greater optimizations compared to applications written in different languages.
This is because Go produces a binary in a similar manner to Rust or C++.
You might also want to consider using a multi-stage build approach if your application needs dependencies at build time. But those dependencies would not be required to run the application. For example, if you had a Python application which talked to a PostgreSQL database. Then most likely the PostgreSQL library would require C dependencies at build time but not runtime.