How Important is SaaS Reliability? 90% of Business Leaders Say “Very Important”

Deirdre Mahon

A couple of weeks back, Blameless attended SaaStr 2021, the go-to event for any business Go-to-Market (GTM) team which has been running since 2012. Our decision to sponsor was made in early 2020. Back then, we had no idea how long the pandemic would last or that it would be a full 18 months before we’d be able to do a physical event.

The SaaStr team was able to host an in-person show at San Mateo’s county fairground while maintaining all social distancing requirements. They pulled it off and hosted 4,000+ attendees across the outdoor fairgrounds. It was an excellent feat of presentation engineering. Inside tented pavilions, keynotes and panel sessions received the full AV production treatment. Sponsor booths also spread out and had enough partial tenting to shield your eyes from the bright sun, helping you recognize sponsors logos. Demos on bright screens proved harder, but it meant visitors had to engage in conversation and that always goes better for everyone.

Just the physical 3D with teammates was wonderful and running into former colleagues was also delightful. It’s been a long 18+ months. We even raised our glasses during happy hour as conversations flowed naturally, without the guilt of scurrying off to another Zoom call.

Blameless typically sponsors events for engineers, DevOps, and SRE professionals, so it’s a bit unfamiliar explaining Blameless to non-engineers. Of course, few would argue against the merits of adopting a blameless mindset. It was a pleasant surprise to hear marketing, sales, and customer success folks claim, “We should definitely adopt that culture too…” Now there’s a thought.

So Who Attends SaaStr?

We conducted a short survey over the three days and here’s how it split across functions for the 100 responses we received:

As you’d expect, Engineering, Finance, and Legal are the least represented groups with Sales making up the functional majority. There’s a healthy balance across Marketing, Customer Success and Product teams. Also a full third of attendees were C-level executives.

SaaStr is known for its rich content and this year didn’t disappoint. Many of the speakers come with success stories or are on their way to a high growth, high-scale status. The stars this year included Freshworks, Twilio, Profitwell, Atlassian, Sendbird, and Talend. Insightful practices, short-cuts, how to work across functions, and when to raise funds or diversify were shared during big-stage sessions, panels, and break-outs. With so many sessions, it was hard to pick a favorite, but mine would have to be Profitwell’s CEO/Founder, Patrick Campbell. His talk titled, A Playbook for Revenue Automation, speaks to how changes in the SaaS industry over the past decade have influenced changes in the approach to revenue success.

Based on SaaS pricing data, companies with freemium offerings have better customer/user retention over the long-run. This wasn't the case 3+ years ago, but SaaS changes. I know engineers love to get their hands on products early, and a free, low-feature option is a great way to discover and fall in love with something new that brings value.

SaaS and Reliability 

The term SaaS, coined circa 2000, isn’t just a way of describing the underlying tech stack or how users access the service. It's equally about the business model or how customers pay over the lifetime of usage. Salesforce probably takes the prize as the fastest growing SaaS in the early years. Regardless of which stack the software runs on, SaaS means you log in with a password to use and get value.

Reliability to an engineering, SRE, or DevOps team means the system (or service) is available and performing to expected levels, so users can accomplish tasks using the service's features. To a business person, reliability means the same, though it may be worded a little differently. The service is up; users can log in and conduct normal tasks without failure, errors, or latency

When that doesn’t happen, operations halt and any number of downstream effects could spell disaster, depending of course on the industry and nature of the service. For most downtime or outages there’s some denting in brand value or more likely a revenue hit. Regardless, it's an interruption to business momentum and it needs to be resolved with clear communications along the way.

Engineering, SRE, and ops teams get to work resolving incidents and communicating along the path. In reality, many small or less impactful incidents occur without customers or business teams ever knowing. It’s the major outages -- Sev1’s and Sev0’s -- that grab attention, and not the kind you want. Recent outages at Facebook (as well as Instagram and WhatsApp), HubSpot, and Slack are certainly unfortunate, but they’re a reality in today’s log-in world. On-call engineering teams need all the support they can get, which starts with a blameless culture. Tools, process, and ample resources up, down, and across the chain are critical to achieving resolution and continuing smooth-running operations.

Do Business Leaders Know Their SaaS Reliability Levels?

In a word, no. Let’s say you’re a sales, marketing, or CS leader and you need to communicate to future customers the % of time that your service is available. That’s a standard question to ask, right? Of course, how reliable a service should be depends on your product’s pricing, criticality, and whether you charge annually or monthly. We asked this question of our audience and surprisingly only 41% knew the answer for their own SaaS product. 

This begs some questions. Is it because the product and engineering teams don’t proactively share reliability levels? Or is it because they don’t feel the levels are worth bragging about? Maybe it’s because those levels are inconsistent over time or they’re worried about getting into legal knots with any existing or future SLA (service level agreement). Maybe it’s a mixture of all the above. 

Regardless, the teams who design, market, sell, and support should be knowledgeable on the topic of service reliability. This is where the engineering, ops, and business functions must come together and speak a common language in order to get on the same page. That starts with mutual understanding.

To see how in demand this information is, we asked the question “How useful would a report be that told you the reliability of your service?” As you’d expect, the majority stated ‘useful’ or ‘very useful’ with a whopping 78% scoring 8-10 on the scale, 10 being the highest.

So how do you get business and engineering teams on the same reliability page?

It actually starts with 3 letters - SLO (Service Level Objective). We know there’s no such thing as 100% uptime, so the question then becomes what degree or % reliability is acceptable for end-users, based on service type. It turns out that Salesforce, one of the largest SaaS providers in the world, takes services down regularly for ongoing maintenance. You can check out their Trust Status pages by service type here

By working together as one cohesive engineering, DevOps, and SRE team, you agree on the specific SLIs (indicators) that need to be optimized in order to drive a reliability SLO, which can be measured over a month-long time frame. Here’s a comprehensive guide on SLOs, SLIs, and error budgets to help you get started. It can take time to pinpoint the right metrics and goals that suit your organization, but trying them on for size and iterating over time will help you get there quickly.

Agreeing on the exact SLOs can be a heavy lift for some, and it depends how closely engineering and operations teams are already working. Coming to agreement on the acceptable levels of reliability attributes is not easy and is highly dependent on budget, resource availability, and a solid understanding of end-user expectations. Truthfully, it’s striking the balance on selecting the right objective and sticking to it consistently, over time. The coveted “Error Budget” is where teams need to unite and then manage from there. Bringing product and GTM teams up to speed on what’s possible is an obvious natural step and that requires lots of explanation, report sharing, and tracking.

Recommended approaches to setting initial SLOs say start with a small number, then track, iterate and improve. It doesn’t need to be perfect out the gate. 

SaaS Reliability Benefits. Oh So Many!

It’s clear from our short survey that GTM teams recognize the obvious value in reliability. At Blameless, our head of strategy, Kurt Andersen, likes to say, “Reliability is a team sport, and if you don’t have reliable services, nothing else matters”. Tons of unique differentiating features don’t matter if you can’t log in and perform basic tasks, without errors or latency. The market agrees.

Sales, marketing, CS, and product teams could really leverage reliability as a distinct advantage over their closest competitors. The next step is simply to get business and SRE teams on that same page. Piece of cake, right?