Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

At Blameless, Reliability is Personal

During our 2019 Blameless Summit, CEO Ashar Rizqi spoke on his relationship with reliability and how it impacts his personal experiences.

The following transcript has been lightly edited for clarity.

Ashar: Thank you everyone for coming to this second Blameless Summit or B-Summit. I like that. I was asked to talk about why reliability important to me personally. I was up at 3:00 AM in the morning, this morning, thinking through this question. My sleep is obviously pretty unreliable, and those kinds of questions will always get me going. I thought, let me walk folks through my personal story. I hope that it connects with some folks here.

I've been fortunate enough to have a different perspective throughout my career and I hope to share that with you and talk about why something like reliability is so personal. It's a little bit of a vengeance story, although I'm not the person that comes out on top.

I started my career as a systems' administrator. As a systems' administrator I lived in a very different world. In that short career, I've been fortunate enough to have seen what the world used to be like back in the day. As an engineer, my job was to administer large storage systems and it is as boring as it sounds, I can promise you that.

One of the things that I remember from my time doing this, and this was just over two years, was this constant feeling of dread. If I look back to that time, it was the sinking feeling in my stomach. I look at my phone for other reasons, but at the time there would always be some sort of alert coming up on my phone, making me get up at 2:00 AM in the morning to recover something or run some tiny script to resolve an issue or restart a machine.

I was newly married at the time, no kids, and there's nothing like on-call that really puts the stress on your marriage. I can promise you that. My time doing the systems' administration was very much that of sleepless nights. When you're newly married you're like, "Hey, let's hang out with other couples." Well, we had myself, my wife, and my laptop that would sort of travel with me. I think I had more of a relationship with the terminal than I did with my wife at the time. And if you ever talk to her, she'll probably agree.

I was newly married at the time, no kids, and there's nothing like on-call that really puts the stress on your marriage. I can promise you that.

It's a source of a lot of frustration. I was like, "Man, how do I change this behavior?"

Maybe as an individual contributor, like a tiny cog in this very large machine, I'm not going to be able to do this. And this was the fallacy of youth at the time. I was like, "Oh, you know what, maybe if I go into management, I can change things and I can make the world a better place."

So life is different as a manager. And when I did go into management and started managing SRE teams and hiring SREs, I was working at a fast-growing startup that had a tremendous amount of technical debt piled up, and that was my job to figure out how to burn down. The takeaway for me as a manager was everything is on fire.

Your team is complaining to you about constantly being on call, under pressure to perform and keep these systems running, and losing sleep. I'm like, "I've been there, I've seen that. But as a manager, I don't know if that's my problem anymore. I got to figure out how to solve this problem in a different way."

You get a lot of pressure from your peers that are like, "Hey, we're trying to move fast. We got to push fast. You keep slowing us down. Why do you keep doing this?"

There's the product and development managers who say, "Hey, we got to ship fast, but Ashar is this gatekeeper and he keeps slowing us down."

That usually ends up painting a target on your back, which isn't pleasant.

Then you've also got pressure from the management. At some point you come to accept your fate and you're like, "You know what, I've done this for about two and a half, three years and, maybe I can't change this. Let me try something else."

But what I did realize is that everything is on fire and it hurts. It personally hurts because I can't make the change needed to improve things for my team. I can't move the company faster. I don't necessarily have the circle of influence that I need, but reliability is really important. The fact that we're in this place where these fires exist is because we made bad decisions.

But what I did realize is that everything is on fire and it hurts. It personally hurts because I can't make the change needed to improve things for my team. I can't move the company faster. I don't necessarily have the circle of influence that I need, but reliability is really important.

Even as an operator or as an employee, you end up becoming a shareholder through time. I'm a shareholder in this particular company and they announced their Q2 results recently. And the first line item in their Q2 results was that we just paid out $8.2 million in service credits for outages that amounted to two hours. I'm giving some hints out, but I won't reveal the name of the company. This was 5% of their quarterly revenues, 1% of their annual revenues. To me that's like the most expensive parking ticket. Imagine you've got a parking ticket for $8.2 million, which is very likely given this is San Francisco.

The thing that I wanted to show here is if you look at that dip in the graph, it's right after they announced those results. A bunch of reporters said, "Oh, this platform is going to have a hard time competing with this other more reliable and better platform." That's like the worst kind of press that you can have concerning reliability issues.

That dip in the graph represents two and a half billion dollars of market cap that was wiped out. I think this company will recover, but for me, I'm a very tiny shareholder in this company, so I was angry. It instilled this sense of fear even down at that individual contributor level. So it's very much personal, right? When you have reliability issues, it trickles down in ways that you don't necessarily think about on a day-to-day basis.

The folks that are on your teams today, they're human beings. They've got lives where their owns, they've got dreams of their owns, they've got plans, and they've made these commitments to you, to others in the team, so that they can actually fulfill those dreams. So it's actually quite painful when you can't do that.

It instilled this sense of fear even down at that individual contributor level. So it's very much personal, right? When you have reliability issues, it trickles down in ways that you don't necessarily think about on a day-to-day basis.

So what was the next step for me from here? I was like, "I can't do anything as a shareholder. Can't do anything as a manager. Can't do anything as an engineer. What do I do? I'm going to take this problem into my own hands and I'm going to solve it myself."

I'm 31, and I like to think of myself as pretty experienced, as pretty cerebral in my thinking. And I'm like, "Oh, I've got everything planned out." And day after day, life just shoots me down in various ways.

So as an entrepreneur I was like, "There's going to be all this control over where things are going to be, and what direction we're going to take, and we're going to solve a big problem for everyone."

It takes a certain set of experiences to then be able to realize, "What have I gotten myself into?" It's actually a combination of all of those things: of sleepless nights, of everything being on fire, of fiduciary duty and shareholder responsibility. These are predicated on us building a reliable platform for our customers.

It's very much a personal story. Reliability has been a part of my life from the beginnings of my career all the way to today, and affects me personally in all sorts of different manners. It challenges us in all these different ways that we don't really imagine. And my takeaway from all of this is pretty positive: bring it on.

I would love to be the system that absorbs all of the blame, frustration, and unhappiness, so that others don't really have to. These are problems in life worth solving, not just for yourselves but for others. Think about reliability impacting not just the thing that you have this tunnel-vision focus on, but also the relationships that you have, both personal and professional.

Reliability has been a part of my life from the beginnings of my career all the way to today, and affects me personally in all sorts of different manners. It challenges us in all these different ways that we don't really imagine. And my takeaway from all of this is pretty positive: bring it on.
Resources
Book a blameless demo
To view the calendar in full page view, click here.