Nowadays, zero-downtime deployments are the norm in the software industry. Strategies like Rolling updates, Canary releases, and Blue-green deployments have become ubiquitous, being used by literally everyone and every platform out there. These methods allow updating applications with little to no perceived downtime, seamlessly transitioning users to new versions.
However, while these strategies effectively address the downtime part of the equation, they shift the trade-offs elsewhere. All of these deployment approaches require having multiple copies of the application, gradually rolling out changes to a subset of servers or pods while the old version continues to run on others. This process ensures service availability throughout the deployment, but it also means that two versions of the application must coexist temporarily within the same infrastructure, which has several implications.
Common issues
Case study: Deleting a table column with zero downtime using rolling strategy.
This operation is typically done in a single step: the code and migrations work together to delete the column from the database and stop using it in the code.
During the deployment window, which could last from a few seconds to minutes in a rolling deployment, a critical issue emerges. Once the new image is deployed and migrations are applied, the new version of code could be executed on some replicas while the old version runs on others. The old version begins to throw errors if it tries to access the already deleted column, while the new version works correctly.
This dichotomy generates error spikes during deployments and, in general, could lead to data corruption, inconsistencies, and outages. Most alarmingly, these are not the only problems. Performing such an invasive change in one step implies a bigger risk: it eliminates the possibility of a clean rollback, only allowing fast-forward error solving.
Fortunately, there’s a better way:
The Two-Phased Solution
To address these shortcomings and mitigate the associated risks, a two-phased deployment approach becomes crucial. This is not just a fancy name; it literally means two separate deployments, usually separated by a time window. It works like this:
Phase 1: Code adaptation
The first phase involves deploying new code that doesn’t use the soon-to-be-deleted database column. However, the column still exists in the database at this stage. This allows both old and new code to run without errors, ensuring a smooth transition or rollback if needed.
Phase 2: Schema change
Once it’s safe, it’s time for the second phase. This involves deploying the change to delete the unused column.
Benefits of Two-Phased deployments
Implementing two-phased deployments ensures more reliable and reversible updates to applications. This approach offers several benefits:
-
It provides a smooth transition between versions
-
It allows the ability to do safe rollbacks
-
It significantly reduces the risk of data inconsistencies or application errors.
Beyond a table column deletion
While we’ve focused on column deletion, two-phased deployments can be generalized to implement any sort of changes, making the separation into phases a powerful concept for reliable engineering practices.
Conclusion
Two-phased deployments fill a critical gap between zero downtime and reliability, allowing teams to navigate complex updates in a simpler way. By separating the process into phases, they minimize risks and improve overall system stability. Ultimately, this approach ensures that applications can evolve gracefully, maintaining both performance and data integrity throughout the deployment process.