EXCITING NEWS: BLAMELESS JOINS FORCES WITH FIREHYDRANT! Click here to view our blog!
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.
The Blameless Podcast

Resilience in Action E14:

Resilience Is the Audacity of Adaptive Capacity ft. John Allspaw
RIA Episode 14

Resilience Is the Audacity of Adaptive Capacity ft. John Allspaw

June 15, 2022

Kurt Andersen

Kurt Andersen is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know. Before joining Blameless, Kurt was a Sr. Staff SRE at LinkedIn, implementing SLOs (reliability metrics) at scale across the board for thousands of  independently deployable services. Kurt is a member of the USENIX Board of Directors and part of the steering committee for the world-wide SREcon conferences.

John Allspaw

John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. He authored the books The Art of Capacity Planning and Web Operations, as well as the forward to The DevOps Handbook. John’s 2009 Velocity talk with Paul Hammond, 10+ Deploys Per Day: Dev and Ops Cooperation, is credited for helping start the DevOps movement. John served as CTO at Etsy and holds an MSc in Human Factors and Systems Safety from Lund University.

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives and more. This podcast is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. He authored the books The Art of Capacity Planning and Web Operations, as well as the forward to The DevOps Handbook. John’s 2009 Velocity talk with Paul Hammond, 10+ Deploys Per Day: Dev and Ops Cooperation, is credited for helping start the DevOps movement. John served as CTO at Etsy and holds an MSc in Human Factors and Systems Safety from Lund University.

Links referenced in the episode:

An Increment Summary Article on the Applied Ergonomics Paper

Transcript:

Kurt Andersen (00:27):

Hello, I'm Kurt Andersen. Welcome back to Resilience in Action. Today we are talking with John Allspaw from Adaptive Capacity Labs. So by way of introduction John, could you give us a brief description of what adaptive capacity is?

John Allspaw (00:43):

Well, so adaptive capacity is a concept that comes from a whole host of fields, but I guess the most succinct way to put it is an ability to adapt without. And this is the sort of the key part without necessarily knowing beforehand what the adaptation is needed for, or needed to do, or what the adaptation is. So this capacity to adapt and that's kind of what makes this resilience engineering, it's a sustained adaptive capacity. A bit of a mindblower, I don't know, how's that work?

Kurt Andersen (01:39):

There's just so much to explore out of that point, that I'm not even quite sure where to start. But the idea of sustaining it, let's explore that just a little bit.

John Allspaw (01:54):

Mm-hmm.

Kurt Andersen (01:56):

I was trying to think of analogies or maybe ways that we could help our listeners to relate this to something in their daily-

John Allspaw (02:05):

Mm-hmm.

Kurt Andersen (02:05):

-work and life. Do you have any at your fingertips that maybe we could talk about and how this concept of sustainable adaptive capacity would relate to that?

John Allspaw (02:18):

Yeah, so let's see... I might just talk at you for a little bit here. Let's see how this goes. Right, so the assertion that I'll make, and the reason why the name of our company is called Adaptive Capacity Labs is that this notion of resilience, this notion of sustained adaptability, an ability to adapt is always present. And it's not the special thing... Oh, we didn't do... we didn't pay attention to resilience and so therefore we didn't have it. No, that's not how it works. As my colleague, Dr. Cook has said "the resilience is coming from inside the house". And so-

Kurt Andersen (03:16):

Okay.

John Allspaw (03:18):

-there's lots of... there are a lot of activities in sort of software engineering centered organizations.

Kurt Andersen (03:27):

Mm-hmm.

John Allspaw (03:27):

Let's just say real broad strokes here. There's lots of activities, real concrete grounded, daily activities that people would just sort of characterize or describe as just normal work. That are, and continue to be sources of what we would call resilience and or investments in this ability, this capacity to adapt to situations that aren't necessarily anticipated.

Kurt Andersen (04:01):

Okay.

John Allspaw (04:01):

Or even imagined.

Kurt Andersen (04:03):

Mm-hmm.

John Allspaw (04:03):

And so, for example, code review is a great, perfect example. The thing about code review, it's pretty expensive from an actuarial sort of accounting standpoint. Engineers are engaged in code review, a significant portion of their day or the week, right.

Kurt Andersen (04:30):

Mm-hmm.

John Allspaw (04:32):

The...Some code reviews, and I'm sort of idealizing right. In this description, some end up with, oh, okay, well, oh, I see what you're doing here. Yeah, I mean, I guess this makes sense. Okay, cool. This looks good to me, right?

Kurt Andersen (04:51):

Yeah.

John Allspaw (04:54):

Some result in discussions amongst maybe multiple people. Whoa, what are you doing here? This is... Whoa. There's... You're going to go off a cliff here. Or, this is a ticking time bomb, or so here's this part. And let me explain how this might go awry in these ways, or I don't understand how this part, and there might be some, can you explain this bit and exchanges? And so somewhere in between those, right, is probably the majority of how these exchanges go.

John Allspaw (05:33):

It takes a lot of effort, right, to do this, and a lot of attention. And I've yet to come to be aware of an organization that like tabulates the amount of time that goes into it. If I write some code and I ask you and a couple of colleagues to take a look at it, Hey, please help me make sure I don't shoot myself in the foot here. And you'll look at it and nothing major, maybe not even minor and still a little bit worried about it. Okay, and I deploy it, and low and behold, everything seems to be okay. Was that investment in time, which was not insignificant that you all, that the collective of us, was it worth it?

Kurt Andersen (06:19):

Mm-hmm.

John Allspaw (06:21):

I don't know.

Kurt Andersen (06:22):

Right.

John Allspaw (06:23):

Right. And so that's the thing about... that's the thing. We don't really much pay attention to it because nothing happened that was at least adverse around, or damaging or negative.

Kurt Andersen (06:37):

Right.

John Allspaw (06:38):

That's the flip in resilience engineering. The flip is that things that go right, right.

Kurt Andersen (06:49):

Mm-hmm.

John Allspaw (06:49):

Are a much richer source of data for you to understand how people do what they do, so you can better support them. The rub is, it's really hard to get people to expend energy and attention, to look at nothing burgers.

Kurt Andersen (07:13):

Right.

John Allspaw (07:14):

Do you see what I'm saying?

Kurt Andersen (07:14):

Right, sure.

John Allspaw (07:15):

Incidents, negative events... Oh, you don't have a problem persuading people to pay attention to them, right.

Kurt Andersen (07:23):

Mm-hmm.

John Allspaw (07:25):

And so the idea behind resilience engineering, underpinned by this idea that adaptive capacity is being invested in to some extent in different ways in varying directions all the time. We just don't... We're just not identifying it. And so, the term adaptive capacity is conceptual, it's not really contextually specific. You can't just point to say, show me where on the slack transcript did the adaptive capacity show up.

Kurt Andersen (08:06):

Right.

John Allspaw (08:07):

Right. That's not... I don't know if this helps.

Kurt Andersen (08:10):

Not quite a thing. So, in some ways it almost sounds like that old cliche about advertising that half of advertising is wasted, but nobody knows which half. And I'm probably butchering it. And in some ways it's like the activities that we do such as code review, we believe are beneficial and or necessary because they're mandated by some regulators or whatever. But we don't really know how effective or what parts of them bring the value to the table. Is that...?

John Allspaw (08:50):

Sort of, yeah.

Kurt Andersen (08:52):

Am I on track there?

John Allspaw (08:52):

Yeah, sort of. I mean, look at a high level. And I've been doing some supervising of some students in this program that I was in at London University recently. We had a conversation earlier this week with a few of them, there's resilience in the resilience engineering framing, and then there's resilience engineering.

Kurt Andersen (09:20):

Okay.

John Allspaw (09:21):

And if you were to ask the sort of the pioneers, the heavies, woods, cook, whole noggle, decker, [inaudible 00:09:33]. They would say that a good, significant portion of the last 20 plus years is really to just understand what resilience looks like, so that you could identify it in the wild.

Kurt Andersen (09:49):

Okay.

John Allspaw (09:51):

You can't engineer a thing if you don't know what it looks like.

Kurt Andersen (09:57):

That's fair.

John Allspaw (09:58):

The field of resilience engineering, that is to say... Well, engineering resilience, that is to say deliberately, actively, intentionally supporting people and systems to have capacity to adapt to unforeseen situations is not entirely clear. The field has made more progress on characterizing and identifying what resilience looks like.

Kurt Andersen (10:40):

Mm-hmm.

John Allspaw (10:40):

Where it's only just recently, last couple of years where the field has actually been able to make some progress in offering up what concrete demonstrations of engineering resilience looks like.

Kurt Andersen (10:56):

Okay.

John Allspaw (10:56):

And the thing that I would say is probably the most important, and I think that it will be... It'll only become apparent how important it is over time is an article that Beth Long and Richard Cook wrote for the Applied Ergonomics journal. I can give you a link to it, but this building and revising adaptive capacity sharing for technical incident response, the subtitles, a case of resilience engineering.

Kurt Andersen (11:27):

Okay.

John Allspaw (11:28):

And, it's an in the wild, empirical, with real sort of concrete, rich description of what engineering resilience looks like now. Of course-

Kurt Andersen (11:41):

Nice.

John Allspaw (11:41):

-the case they're using, the organization that this case is based on-

Kurt Andersen (11:47):

Yeah.

John Allspaw (11:49):

-they didn't, and probably wouldn't recognize that what they were doing was engineering resilience.

Kurt Andersen (11:57):

Okay.

John Allspaw (11:58):

They were just doing [inaudible 00:12:00].

Kurt Andersen (11:59):

They were not doing this intentionally. Yeah, okay.

John Allspaw (12:02):

No, they were doing it intentionally. In fact, there's an evolution about what they did and that's sort of captured in the paper. But they wouldn't have characterized it as, we're amplifying and supporting and augmenting existing adaptive capacity. They didn't say, oh, we've identified that this thing that we do is resilience, a source of resilience. So therefore, we're going to support it more explicitly and deliberately.

Kurt Andersen (12:33):

Okay.

John Allspaw (12:33):

But that's pretty part and parcel, that's the thing is that, as has been said before, Murphy's law is wrong. What could go wrong, frankly, almost never does. We just don't notice those.

Kurt Andersen (12:45):

Fair.

John Allspaw (12:46):

We just call it normal work. You think every software engineer who deploys changes has an opportunity to absolutely screw things up.

Kurt Andersen (12:58):

Mm-hmm, sure.

John Allspaw (13:00):

And yet, the vast majority, they don't.

Kurt Andersen (13:02):

Right.

John Allspaw (13:03):

To the point at which our industry measures, or at least let's say describes in that 99% as being successfully operational, successfully-

Kurt Andersen (13:18):

Right.

John Allspaw (13:18):

-working, so...

Kurt Andersen (13:21):

Well, I mean depending. If you look at the DORA metrics, you see the lower performers claim that their success rates are significantly lower than the high performers. But for changes pushed through and not having adverse consequences or have to be rolled back or something like that. And of course, their relatively fuzzy, qualitative values that those percentages come from, I think. So I'm curious, you had 10 ish years ago, you and Paul did the talk that kind of catalyzed the DevOps movement in the 10 plus deploys a day at Flickr. How would you, with the subsequent development of DevOps as a practice, I'll avoid the problems and the ways it's gone off the rails. How would you relate to that idea of joint work between the Dev and the Ops teams in terms of adaptive capacity? Did it-

John Allspaw (14:29):

Oh, yeah.

Kurt Andersen (14:29):

-does it improve it? In what ways does it improve it, etc.

John Allspaw (14:34):

Right, yeah. So the thing that immediately comes to mind, and Hamit and I have talked a good deal about this, and Thomas Depierre interviewed the two of us, I think last year for SRE.

Kurt Andersen (14:53):

Yes, I have a link that we'll put on the show notes.

John Allspaw (14:58):

The thing that is, I mean, is not just encouraging, definitely encouraging, is you have to remember that this idea is of Dev and Ops cooperation. This is a very grassroots, bottom up realization.

Kurt Andersen (15:17):

Mm-hmm.

John Allspaw (15:17):

Right. Much like continuous deployment, which is often sort of conflated or certainly coupled with DevOps. It wasn't an intentional program. It wasn't like-

Kurt Andersen (15:31):

Right.

John Allspaw (15:31):

-we are going to... It wasn't like some regulatory body or some standards body who says, we are going to do a new thing, right. It's a recognition that, Hey, who... Where does it say on carved in stone tablets that the people writing the code, can't be the people deploying it.

Kurt Andersen (15:54):

Mm-hmm.

John Allspaw (15:56):

And where does it also say that the people who are writing the code must somehow be completely unaware of how the systems that are running the code that they wrote are working in various details and telemetries and states. It was made up.

Kurt Andersen (16:22):

Right.

John Allspaw (16:23):

And we were like, well, we don't have time for that. I don't... This is like, would somebody going to yell at this? No, you must give... Whatever, that didn't make sense. And so in-

Kurt Andersen (16:37):

Well, there are some regulatory regimes that would yell at you, but...

John Allspaw (16:41):

You know what? I don't... Fine. And they're always wrong. And we know that they're always wrong.

Kurt Andersen (16:48):

Okay.

John Allspaw (16:48):

Right. We know that they're always wrong, and sorry, [inaudible 00:16:53]. If Etsy can satisfy Sarbanes-Oxley without dealing with separation of duties, then whatever. That's an excuse to keep status quo.

Kurt Andersen (17:09):

Yeah.

John Allspaw (17:10):

And so now to your question, how does it relate? I mean, again, adaptive capacity is... and the key way it's used within the resilience engineering framing is for incidents, or for say events, or situations that are unforeseen. That is to say fundamentally surprising.

Kurt Andersen (17:37):

Okay.

John Allspaw (17:38):

There are things that can happen that are surprising, but not things that you couldn't have anticipated. In fact, many are in thought, oh yeah, I guess this could happen. I just don't think it's likely.

Kurt Andersen (17:52):

Mm-hmm.

John Allspaw (17:53):

Fundamental surprises are those that you didn't even have an ability to... You had no frame, you had no expectation that it was anything remotely like it could happen, right. A colleague of ours once described this difference between a situational surprise and a fundamental surprise as a situational surprise is buying a lottery ticket and winning the lottery.

Kurt Andersen (18:21):

Mm-hmm.

John Allspaw (18:22):

A fundamental surprise is winning the lottery when you didn't buy a lottery ticket.

Kurt Andersen (18:30):

Yeah, that would be pretty surprising.

John Allspaw (18:33):

Yeah. And so, it's not about... And this is like a big...what fuels the difference between the concept of robustness and resilience. Which is, robustness are all of the things, it's about preparation, preparing for situations that you can imagine or anticipate.

Kurt Andersen (18:54):

Mm-hmm.

John Allspaw (18:55):

Resilience, in the case of adaptive capacity, it's about adaptation to situations that are actually unforeseen, and that's really a litmus test. And so it means that you have to adapt in ways that you hadn't considered you needed to adapt, and can you do that?

Kurt Andersen (19:21):

Okay... I'm trying to think through that, because it seems like limiting it to this category of fundamental surprise.

John Allspaw (19:32):

Well, we're not limiting, just... It's a hallmark, it's the thing to-

Kurt Andersen (19:36):

Okay.

John Allspaw (19:36):

-rhetorically get people to the idea.

Kurt Andersen (19:39):

Okay. So if you had a team developing software, and had never ran into an incident report on this recently, had never really dealt with thundering herd as a concept. And your engineers just were unfamiliar with that concept, and they did something that invoked a thundering herd. Does that qualify as a fundamental surprise because they were unfamiliar enough with the field to not anticipate that? Or is that still in the category of, Hey, you should have planned for this?

John Allspaw (20:25):

It's... You'll note that lots of the words that you just used are sort of retrospective in nature.

Kurt Andersen (20:31):

Correct.

John Allspaw (20:32):

They weren't aware, they should have been aware, they could have been. That's ... it's a bit of a losing game because we're describing what we now know after an incident.

Kurt Andersen (20:49):

Correct.

John Allspaw (20:49):

As a way to explain, and it doesn't... It's not really... Doesn't tend to be helpful, right. It doesn't explain actually, it doesn't explain the world that they found themselves in, right.

Kurt Andersen (21:06):

Correct.

John Allspaw (21:08):

A better question, couple, there's a number of better questions. A better question is, it might be where they forget about the term thundering herd. Were they able to... What made it such, that the organization involved, which no team stands on its own,-

Kurt Andersen (21:27):

Mm-hmm.

John Allspaw (21:27):

-at least in a company that's very successful for any period of time.

Kurt Andersen (21:32):

Mm-hmm.

John Allspaw (21:33):

Does... What allowed the people responding or observing, or a task with understanding what was going on, able to work out what was going on. And in which case, what resources did they have available that helped them.

Kurt Andersen (22:00):

Okay.

John Allspaw (22:00):

You know, the thing about resilience that makes it different is actually that it kind of flies in the face of justifying the need to do it. In fact, actually, my colleague has said that resilience is... could probably better be described as the audacity of adaptive capacity, right.

Kurt Andersen (22:26):

Okay.

John Allspaw (22:29):

To what extent? So for example, Laura Maguire, excellent, amazing engineer. She did an entire dissertation on-

Kurt Andersen (22:39):

Yeah.

John Allspaw (22:39):

-coordination, part of coordination, and the cost of coordination, includes bringing people, recruiting expertise-

Kurt Andersen (22:48):

Correct.

John Allspaw (22:48):

-calling for help.

Kurt Andersen (22:50):

Mm-hmm.

John Allspaw (22:51):

If you set aside all the dilemmas and sort of... involved with when to call for help.

Kurt Andersen (22:57):

Mm-hmm.

John Allspaw (22:58):

Which is actually a topic in and of itself, what makes it so that it's easy for people to call for help, and easy for people to know who to call? There are organizations where they might have an amazing incident response history of successful... responding to incidents where cases didn't need to involve calling somebody, right.

Kurt Andersen (23:32):

Okay.

John Allspaw (23:32):

Because they could handle it all locally, who was there, who responded to it.

Kurt Andersen (23:36):

Sure.

John Allspaw (23:36):

And so you could say that a resource that goes into adaptive capacity, that supports adaptation, is somebody on your team having come from another team. They just relocated-

Kurt Andersen (23:59):

Okay.

John Allspaw (23:59):

-and you're looking at a case that involves technology X and they know somebody from their old team who is absolute expert in technology X.

Kurt Andersen (24:10):

Mm-hmm.

John Allspaw (24:11):

Right. That means that mobility across teams, the fact that's even possible, which [inaudible 00:24:22] organizations is. Or it can...some it's more fluid than others, that's a source, right.

Kurt Andersen (24:30):

Okay.

John Allspaw (24:30):

And so that's a source of resilience. That is... It made responding to the incident, ah, we need Sylvia. Sylvia, everybody knows. Or, well actually no, the rest of the team didn't know. How can you not know that Sylvia knows all of this stuff. Of course-

Kurt Andersen (24:45):

Fair enough.

John Allspaw (24:46):

-this is the first thing you know, you call Sylvia. And so every organization has that, but of course, usually we put that... We sort of describe that in the tech industry, as tribal knowledge. It's not tribal knowledge, it's actually a critically important set of skills,-

Kurt Andersen (25:08):

Mm-hmm.

John Allspaw (25:09):

-and it's a resource. So then the question is how, can you... What are ways that you could expand and broaden the number of people who know, or at least have a better, more improved understanding of whom to call in what situations.

Kurt Andersen (25:32):

Right, Okay.

John Allspaw (25:34):

You could consider this as a little bit of a sort of a teaser for the Cook and Long paper because-

Kurt Andersen (25:39):

Okay.

John Allspaw (25:41):

-it goes into this quite a bit.

Kurt Andersen (25:43):

Yeah, that would be great.

John Allspaw (25:44):

That's an example.

Kurt Andersen (25:44):

We'll be happy to put a link to that in the show notes as well, for people who want to read more. So let's pivot just a little bit, you worked at Flickr, then you worked at Etsy and I believe it was during your time at Etsy that you undertook this master's in Human Factors and System Safety at Lund.

John Allspaw (26:02):

Mm-hmm.

Kurt Andersen (26:03):

And I think you were one of the early folks from the software industry to go through that program.

John Allspaw (26:10):

Yeah.

Kurt Andersen (26:12):

And then a bunch of others have followed in your path, thanks to your prolific writings, I think. What did you hope to get out of the program when you first started it? What caught your attention with it?

John Allspaw (26:24):

Yeah, I have to say that going into it, I had some sort of hopes and dreams about what I was going to get out of it.

Kurt Andersen (26:36):

Mm-hmm.

John Allspaw (26:40):

And at some point in it, I developed new and even better hopes and dreams.

Kurt Andersen (26:46):

Okay.

John Allspaw (26:46):

And so it changed over time, I would say that, look, what drew me to the program was... And yeah, I was the first person from software in this program, and the program had been around since 2006, I believe. And I did the... and I started in 2013 and graduated in 2015. The... When I... Around sort of the end time, of the end of my time at Flickr, and the sort of the early part of my time at Etsy, I just sort of self-study, kept reading a lot. What really stumped me, was how the hell are software engineers as good at what... How can they even do what they do?

Kurt Andersen (27:41):

Okay.

John Allspaw (27:43):

I didn't have a good explanation other than, they do, right.

Kurt Andersen (27:50):

Mm-hmm.

John Allspaw (27:50):

Or like anything that's pretty oh well, it's because you're such a good manager, John. Well, we know that's not the case. Oh, well, it's because you're so good at hiring. And some people have it and some people don't, well, that's also horseshit, so what is it? And is it... And I kept thinking like, okay, in software, you've had this opportunity to screw things up. And I mean, spectacularly at pretty much every turn, right.

Kurt Andersen (28:18):

Mm-hmm.

John Allspaw (28:20):

We've seen multiple, we've experienced incidents, we've looked closely into incidents. Multiple organizations, where a one character-

Kurt Andersen (28:30):

Sure.

John Allspaw (28:31):

-change brought 15, 20 hours later, absolute nightmare of an incident.

Kurt Andersen (28:39):

Mm-hmm.

John Allspaw (28:40):

And I just, I couldn't. Here's this thing, that we can't... we don't see it. We don't see the code running, we don't, right.

Kurt Andersen (28:50):

Right.

John Allspaw (28:51):

We see actually, what other code tells us about other code running.

Kurt Andersen (28:58):

Yes.

John Allspaw (28:58):

And by the way, hopefully the thing that's telling us and showing us top PS.

Kurt Andersen (29:05):

Yeah.

John Allspaw (29:05):

Whatever, hopefully that stuff, which is also written in code, doesn't have bugs that get in the way of us trying to understand what bugs is running somewhere else.

Kurt Andersen (29:15):

Right.

John Allspaw (29:15):

But we've got these pictures and I didn't have a conception about it. And, there was... And so the only thing I could reach for, how people make decisions. When there's lots of uncertainty and ambiguity, it's all very classical human factor domains, right.

Kurt Andersen (29:36):

Okay.

John Allspaw (29:37):

Medicine, certainly power generation, places where you have to make inferences about what's happening without firsthand experience, firsthand perceptions. You can't stick your hand out the window while you're flying an Airbus, right.

Kurt Andersen (30:03):

Mm-hmm.

John Allspaw (30:04):

You got to rely on a bunch of instruments and making inferences across all of these gauges and all these sort of streams of data. And so that's what led me, I needed to understand how do people understand. What things, what's happening, and whether or not things are going in a good direction. And-

Kurt Andersen (30:21):

Okay.

John Allspaw (30:22):

-the first year I was at Lund, I got a better handle. There's no way I could say, oh, now I know fully, exhaust completely.

Kurt Andersen (30:36):

Mm-hmm.

John Allspaw (30:36):

But, I had a better handle on it and a better understanding of it. And what I walked away from was an entire... something that I'm quite optimistic about. Is that critical systems, what historically used to be known as safety critical, oh, no one dies.

Kurt Andersen (31:03):

Mm-hmm.

John Allspaw (31:04):

Or either people die or people don't die. That distinction is a construction that's not helpful anymore. And now I have lots of... actually even not just colleagues, but close friends in aviation, air traffic control, and rail and mining and wild land firefighting, and even child welfare and gas construction, and that sort of thing. And that's what I left, which is lots of domains while they are different in a whole host of ways, they're also quite similar. And that's the appreciation I didn't have, I actually didn't expect. And now there's no way that I can unsee that.

Kurt Andersen (32:03):

Right, okay. So if someone is considering going to the program-

John Allspaw (32:07):

Mm-hmm.

Kurt Andersen (32:08):

-for themselves, how would you advise them to kind of prepare and evaluate whether it's a good idea or not?

John Allspaw (32:18):

Yeah, I've got real... You absolutely ought to talk to somebody who's been through the program.

Kurt Andersen (32:25):

Okay.

John Allspaw (32:26):

You just simply cannot figure it out and you need to have... And in fact, I don't know any student, current or former student from the tech industry who wouldn't drop what they're doing to...myself included. If you're considering the program, it's a lift, right? Look, it's a master's program.

Kurt Andersen (32:56):

Yeah.

John Allspaw (32:56):

It's not a certification, it's not a bootcamp, it's real. And so your enthusiasm and curiosity, and open-mindness needs to be calibrated and expectations set that way. But that's the short answer, talk to somebody, reach out to me, Paul Reed, Chad Todd, Jessica DeVita, Colette Alexander, Michael Donlan. These are people-

Kurt Andersen (33:33):

Laura McGuire went through it, didn't she?

John Allspaw (33:34):

Laura McGuire, right, yes. And yeah, and sort of talk to us about it.

Kurt Andersen (33:44):

So, a complimentary part of that is because it is a significant commitment of time and effort. What would the selling points be for someone to convince their management to underwrite that commitment?

John Allspaw (34:00):

Yeah, that's a good question. I think what we'll... what you're alluding to is that, of course, in the program, by Swedish law, you can't pay for your own education.

Kurt Andersen (34:09):

Mm-hmm.

John Allspaw (34:09):

So you have to get your employer to do it. I don't have a good answer for that, and the cop out answer is that I had to convince my CEO and I was in charge. I was leading engineering, and so I was a big wig.

Kurt Andersen (34:38):

Right.

John Allspaw (34:38):

So, It's not like I had to fight very much. I had more concern, I had more difficulty and reasonably so, to convince my wife it was a good idea, than convincing my boss to pay for it.

Kurt Andersen (34:55):

Okay.

John Allspaw (34:56):

And so, I don't have good...I really don't have any good, oh, you should say this. I wish I did, but again, here's the situation. The situation is that the tech industry, and why I'll call that the tech industry, whatever... software actually. That's not even a good way... let's say internet facing services, right?

Kurt Andersen (35:30):

Yeah, maybe.

John Allspaw (35:32):

Academically, they call them digital services, whatever.

Kurt Andersen (35:34):

Yeah.

John Allspaw (35:35):

That world we have at the moment, and I do not expect it to last, have a huge advantage over many other domains. Huge advantage to make well... for the better or the worse, we've got an ability to log every fucking thing, sorry for my swearing. What don't we have the ability to log at? Theoretically, microsecond resolution.

Kurt Andersen (36:08):

Mm-hmm.

John Allspaw (36:09):

What people are looking at, what people are typing. We have digital, we have, as far as data collection is concerned, at least the potential to make progress in safety, science, human factors, resilience engineering, cognitive systems, engineering, all these sort of related fields. Compare and contrast that with what Richard Cook needed to do in order to work cases in the operating room.

Kurt Andersen (36:41):

Okay.

John Allspaw (36:42):

We're talking in the nineties, we're talking multiple VHS cameras, having being set up, cap and microphones capturing continually. And now you've got a whole bunch of on magnetic tape, data streams that you now have to put together.

Kurt Andersen (37:01):

Mm-hmm.

John Allspaw (37:01):

And so [inaudible 00:37:04] in fact, had written an article for a book that's coming out soon, I think on Springer, about some of a huge number of these advantages. Another advantage is that we actually have in... Again, for better or for worse, a huge dearth of regulation, right? There's a lot of domains, especially if you were to understand, let's say, the difficulties people face in a particular domain. And you want to interview them, not because there's an investigation, but you want to interview them after an incident or an accident.

Kurt Andersen (37:44):

Mm-hmm.

John Allspaw (37:45):

A great deal of domains have the response, the part where the practitioner would say, oh, sure, you can interview me. Let's... I'll find out when my union rep and when my legal counsel can be available and we'll do it.

Kurt Andersen (38:00):

Okay.

John Allspaw (38:02):

That's the norm, In fact, anything different is absolutely novel and exotic.

Kurt Andersen (38:08):

Mm-hmm.

John Allspaw (38:09):

That's just simply not the case.

Kurt Andersen (38:10):

Right.

John Allspaw (38:11):

It might be in the future, but not at the moment. So I think that if you are intellectually curious, you could consider, as long as your expectations are set, it is real work.

Kurt Andersen (38:29):

Mm-hmm.

John Allspaw (38:29):

Consider the Lund program, yeah.

Kurt Andersen (38:33):

All right, well, I think that's a great place to wrap up our conversation. We will have links on the show page that you can follow up on some of these pointers that John has dropped along the way. And thank you very much, Sean, for participating and encourage. It'd be interesting to see how many listeners take you up on your offer, just find out more about the Lund program.

John Allspaw (38:57):

Please do, I always say this and people don't email or tweet, or reach out. Come on, just do it.

Kurt Andersen (39:08):

All right, and with that, thank you very.

Pricing calculator   - Blameless Images
ROI calculator

Find out how much 
you could save

Incidents can do real damage to companies that aren't sufficiently prepared them. Use our calculator to estimate the full cost of incidents for your team.
use the calculator
collapse button - Blameless Images