Multi Stage Builds

July 13, 2023 4 min read

In this post, we will go through multi-stage builds for Docker images. Why we might want to use them and the implications of them.

Image optimization

When working with Docker images, one thing engineers will often need to do is to reduce the size of the image. The image size has a wide-impacting reach on things like build times, deployments, operational costs & security issues.

There are a few approaches that come to mind:

At the application level, engineers can look at the project dependencies and remove or migrate away from the larger ones. This is of course, not always an easy win.
We can use a slimmer base image subject to the kind of dependencies our application may need.
We can leverage multi-stage images, to reduce the image size.

In this article, we will only be looking at the final suggestion, multi-stage images.

What is a multi-stage build?

A multi-stage build allows us to split the image up into smaller parts. In doing so, we can select and include only the components that we need in the final image.

If we’re clever enough about the way we split the image up, we can also signpost it for different use cases. For example, we can use the same Dockerfile for CI/CD processes as well as for production, all whilst ensuring that we only include the components that we need for each use case.

Example application

Let’s take a very simple Go application.

Our main.go file looks something like this:

package main

import (
	"fmt"
	"net/http"
)

func main() {
	http.HandleFunc("/hello", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, "Hello!")
	})

	fmt.Printf("Running server on port 8080\n")

}

There’s really not a whole lot going on here, our main function provides the entrypoint to our application. Which spins up a light handler for inbound HTTP requests to the path /hello/.

Single stage Dockerfile

In single-stage terms, we would write our Dockerfile as such:

FROM golang:1.17

WORKDIR /app
  
COPY . .
  
RUN CGO_ENABLED=0 GOOS=linux go build -o main

ENTRYPOINT ["run", "main.go"]

If we run the following command docker build --tag 'go-example-single-stage' . we will build the image.

Once this is complete, we can check the image itself.

❯ docker image ls
REPOSITORY               TAG     IMAGE ID      CREATED         SIZE
go-example-single-stage  latest  9f2fbaaf094b  10 seconds ago  824MB

Our image is already nearly 1GB.

Now let’s take a look at how we could use a multi-stage build to reduce the size of our final image.

Multi stage Dockerfile

# Build stage
FROM golang:1.17 AS build

WORKDIR /app
  
COPY . .

RUN CGO_ENABLED=0 GOOS=linux go build -o main

# Application stage
FROM scratch As application
  
COPY --from=build /app/main /main

ENTRYPOINT ["run", "main.go"]

With a multi-stage build we can organise our Dockerfile into discrete sections. In our case, the first section is a Build stage. We use this to install project dependencies and build the application.

The second section is representative of our Application stage. This is the image that will be used for running the application.

Take note of the COPY --from=build /app/main /main layer. This allows us to bring over only the specific artifacts that we need to run our application.

If we run the following command docker build --tag 'go-example-multi-stage' . we can build our new image.

Image size optimization

Comparing the images we built earlier:

❯ docker image ls
REPOSITORY               TAG     IMAGE ID      CREATED         SIZE
go-example-multi-stage   latest  e77afead2066  10 seconds ago  3.65MB
go-example-single-stage  latest  9f2fbaaf094b  30 seconds ago  824MB

As we can see, building the same image with each approach can result in drastic differences.

Of course, the caveat here is we have been able to build the final stage with the scratch image. Which as you might have guessed, is simply an empty base image.

This particular use case lends itself very nicely to our small Go application.

Summary

It should be noted that different types of applications lend themselves to the multi-stage approach. In our example, we used a Go application which will tend to produce greater optimizations compared to applications written in different languages.

This is because Go produces a binary in a similar manner to Rust or C++.

You might also want to consider using a multi-stage build approach if your application needs dependencies at build time. But those dependencies would not be required to run the application. For example, if you had a Python application which talked to a PostgreSQL database. Then most likely the PostgreSQL library would require C dependencies at build time but not runtime.