Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code.
https://reproducible-builds.org/
Reproducible builds are important and provide benefits in many areas, including:
- Security. Because the same input source code always provides the same output binary artifact, you know that no attacker modified the toolchain to inject vulnerabilities into the artifact. For example, when using a CI service such as Travis CI, Github Actions, or Gitlab CI, it’s possible that their build servers could be compromised and injecting malicious code into your output artifacts. With reproducible builds, you could occasionally run the build on another system then compare the artifacts from that and the CI system. If they don’t byte-for-byte exactly match, then you’ve found a real problem.
- Maintainability. Reproducible builds require the build process to be clear and repeatable, meaning that other people and systems can also execute it, reducing “it works on my computer” problems.
Given how important and valuable reproducible builds are, how can this goal be achieved?
Reasons Why Builds are Already Reproducible
There are a number of common reasons why builds aren’t already reproducible, including:
- Variation in the foundation, including the toolchain (the compilers and linkers) and build tools (such as tar/zip and shell)
- The build system itself (version of Gradle/Maven/Make/etc isn’t controlled and reproducible)
- The build itself isn’t reproducible (common reasons include random ordering of file names in archives, excessive file information being recorded such as file creation dates and owners, and embedding values that vary in the build output, such as the current time and username performing the build)
Reproducible Foundation: Java and the Operating System
Reproducible builds require a reproducible foundation: if the tools that execute the build aren’t reproducible, then the output of the build (probably) won’t be either. Docker containers are a great way to make the build environment reproducible. For example, if you execute builds using Docker with the openjdk:11.0.7-jdk
tag, then you know exactly what version of Java and all of its dependencies are being used, and that they’ll never change.
However, with Java, this concern isn’t as great as it is with some other platforms because Java (within a major release) is very stable. If your project is built with Java 11.0.0, then again with Java 11.0.7, the output artifacts will (almost certainly) be identical.
Reproducible Build System: Using Wrappers
The next step is making the build system reproducible. Having a reproducible build system means that a commit expresses the exact version of the build system to be used. For example, the commit would express that Maven 3.6.3 is to be used. Both Gradle and Maven offer wrappers which make them reproducible.
As an added bonus, developers / CI systems / etc will no longer need to install the build system (Gradle or Maven) at all. When they use the wrapper command (mvnw
/gradlew
), it will download the correct version of the build system and take care of everything else.
Using the Maven Wrapper
Use Maven Wrapper to ensure that the same version of Maven is used and that that version is source controlled. To enable it, run:
mvn -N io.takari:maven:0.7.7:wrapper
and commit the added files to source control. From then on, instead of running mvn
use mvnw
.
Using the Gradle Wrapper
Gradle Wrapper allows the version of Gradle used for a build to be source controlled. Unlike Maven, use of the Gradle wrapper is extremely common in Gradle so your project likely uses it already. To enable it, run:
gradle wrapper
then commit the resulting files to source control. From then on, use the gradlew
command instead of gradle
.
Reproducible Build Itself
Finally, the build itself needs to be made reproducible. When using the same foundation and the same version of the same build system, the build should produce exactly the same output to the byte.
Reproducible Maven Builds
Maven provides documentation on reproducible builds which includes steps on how to modify pom.xml and then how to determine what parts of the output artifact(s) are not reproducible. That approach will (eventually) work, but I’ve found that there is an easier way.
The Reproducible Build Maven Plugin is a quick and easy way to make builds reproducible. Add it to the project’s pom.xml:
<build>
<plugin>
<groupId>io.github.zlika</groupId>
<artifactId>reproducible-build-maven-plugin</artifactId>
<version>0.12</version>
<executions>
<execution>
<goals>
<goal>strip-jar</goal>
</goals>
<configuration>
<zipDateTimeFormatPattern>yyyy-MM-dd'T'HH:mm:ssZ</zipDateTimeFormatPattern>
<zipDateTime>${git.commit.time}</zipDateTime>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
And that’s it – no more configuration necessary. This plugin includes work around for common plugins that don’t (yet) produce reproducible artifacts, such as springboot-maven-plugin and maven-war-plugin.
Making the Gradle Build Reproducible
Gradle provides documentation on how to make the Gradle build reproducible. In summary, add the following to build.gradle
:
tasks.withType(AbstractArchiveTask) {
preserveFileTimestamps = false
reproducibleFileOrder = true
}
And that’s it.
Confirm Reproducibility
A quick and dirty way to check for reproducibility is to run the build twice and see if the checksum of the artifacts match for both builds.
Here’s a one liner that does that gradle:
( ./gradlew clean && ./gradlew build ) > /dev/null && md5sum ./build/libs/*.jar && ( ./gradlew clean && ./gradlew build ) > /dev/null && md5sum ./build/libs/*.jar
And here’s a one liner that does that for Maven:
( ./mvnw clean && ./mvnw package ) > /dev/null && md5sum ./target/*.jar && ( ./mvnw clean && ./mvnw package ) > /dev/null && md5sum ./target/*.jar
If the before and after checksums match, then enjoy all of the security, maintainability, and other advantages of your reproducible build.