The blameless blog

Canary Deployment Benefits & Implementation Guide

Blameless
Blog home
SRE
Myra Nizami
Myra Nizami
|

All deployment strategies have pros and cons. Find out whether canary deployment is a good fit for your team by looking at how it works, and its best practices.

What is canary deployment?

Canary deployment is a deployment strategy to roll out new features incrementally to a subset of users for an initial test.

With canary deployment, the practice is to have staged releases. A small part of users will see the update first. Based on their usage, any bugs or issues are identified and fixed before the update is released as a large rollout to the rest of the users. Canary deployment is a smart strategy for teams looking to strengthen their continuous delivery process and incrementally release updates. Blameless itself uses canary development for its major releases.

Canary deployment has become a way for teams to roll out changes to a smaller segment to identify issues and gain feedback. The release is monitored until certain conditions are met, like some amount of time has passed, a threshold for usage has been met, or reliability targets are hit. Once done, teams will take the data collected and make changes needed before releasing it to a larger segment. 

The origin of the term DevOps canary development comes from the practice of sending a canary to a coal mine. The canary would serve as an early-warning sign as they were sent into the mine, and if it fell ill or died, it was a signal that toxic gases were present in the mine.  The more fragile canary would always feel the effect of the gases before the humans, ensuring humans could evacuate before they were affected.

For DevOps teams, the canary release system loosely follows that concept. Releasing a small update gives teams enough data to understand what works and what doesn’t. That way, they can go back and fix issues before doing a wider release. As a result, canary development can help create a manageable CI/CD pipeline without disrupting the customer experience. 

What are the advantages of canary deployment?

Although there will likely be some level of automation present throughout the CI/CD pipeline to spot errors and issues, it’s hard to gauge how real-time traffic will impact the new code being introduced. 

With a canary development, you get to test it out in a safe space with a small number of users before doing a more extensive roll-out. It’s a test group to spot early warnings and issues and mitigate risk where possible. Other advantages of canary development include:

 

  1. Opportunity to test: Canary development is essentially a way to test out changes in a real-world environment. You’ll learn how users interact with the change and do A/B testing with different versions, capacity testing to understand performance issues, and more.
  2. Feedback: Ultimately, you get valuable feedback that can make the larger roll-out smoother and pain-free. Gain real input from users without disrupting the experience for most users.
  3. Risk mitigation: With canary development, you’ll be able to mitigate risks by spotting defects early without disrupting a large set of customers. And if things go wrong, it’s easy to roll back and fix the issues before full deployment. 
  4. Control: With canary development and smaller releases, teams get granular control over changes and overall workflow. Errors don’t need to be a disruptive event but are taken as part of the process and will be minor to resolve versus issues in a larger rollout. 

What are the challenges with canary development?

Processes like DevOps canary development can be great as part of a strong CI/CD pipeline, but it’s not always easy for teams to implement.

Firstly, canary development does require a level of patience. User feedback from that small test can yield bigger issues and bugs. It can be frustrating and stressful to deal with. If users aren’t aware that they will be experiencing these changes, then it can be an unsettling experience for them. Some teams get around that by inviting users for beta tests and releases. That way, users can self-select to be part of that group and deal with the bugs since they know that is expected.

Thirdly, canary development takes time, especially when it gets complex. Teams will need to migrate users and monitor, and infrastructure needs to handle that. It’s just as complex as a larger deployment but with a smaller customer base. Preparing the pipeline for canary development will take time. 

Canary development also requires a lot of strategic planning. You need to understand what releases are major enough to warrant a canary release. Feature flagging or other code delineation is necessary to be able to rollback a specific release quickly. You also need to decide what user groups are good to use for releases. You should choose users that will be invested in using the new features to get valuable feedback and stress testing. At the same time, they shouldn’t be so dependent on the features that the likely disruptions will cause them lots of pain.

How to implement canary development

The initial steps for a canary development strategy include:

  1. Create two versions of the production environment
  2. Balance load by sending traffic to one version, and deploying a small segment of the traffic to the new version
  3. If there are low percentages of errors and defects, you can deploy to a larger segment of users, steadily increasing the load and testing. 
  4. If a high percentage of bugs and errors are reported, the update is rolled back to correct. 
  5. After the canary development, teams will regroup for analysis and retrospective to plan for a larger-scale deployment

Teams need to consider resilience and workload before taking on canary development. It’s a form of agile working that might not work for teams depending on current workflows, infrastructure, and capacity. In addition, managing multiple versions of the software can be immensely challenging, and there need to be automated tools in place to monitor and spot issues proactively. That’s why doing the initial groundwork in developing the pipeline is crucial to ensure teams have the room to develop the process further. 

Before implementing canary development, teams must decide how many users will be included in the canary development and the number of stages needed. The general rule of thumb is to have 5-10% of users for the canary development. Users can be selected randomly or self-select to participate. Choosing users by region may be helpful, or even doing an internal rollout before going to end-users could be another strategy. 

Teams will also need to consider how long the canary development will last (e.g., monitoring for a few minutes or a few hours) before analyzing the data and reporting back.

 Having success criteria defined beforehand will enable a better and more focused analysis. Success metrics could be:

  • Internal error counts
  • CPU utilization
  • Memory utilization
  • Latency

Infrastructure also needs to be a consideration. How will users be partitioned and performance monitoring? Routers or load balancers can be used, but another option is feature flags. Feature flags split users by creating conditions for different code paths that users can follow – e.g., whether they are in the canary development or not. Feature flags can be a cost-effective measure for DevOps canary development since it’s a way to randomly introduce the new change to a small segment of users. 

How can Blameless help?

For SRE and DevOps teams, canary development can be a part of continuous testing to ensure a smooth customer experience. Additionally, having tools in place that automate incident management and incident response

Blameless enables proactive incident management and response, whether it’s a canary development or a large-scale deployment. Teams can accelerate development velocity and schedule smaller-scale releases with automation to catch and track errors, notify teams, and trigger runbooks as needed. To learn more, schedule a demo today.