Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.
In our fourth episode, Amy chats with Craig Sebenik, SRE at Aurora and co-author of “What is SRE?” and “Salt Essentials.” He has a degree from Le Cordon Bleu (Sydney, Australia), a Master's in Italian Cuisine (Apcius in Florene, Italy), and a Master's in Gastronomy (University of Rheims, France). His greatest passion is teaching what he has learned from adventures in SRE and cooking.
See the full transcript of their conversation below, which has been lightly edited for length and clarity.
Amy Tobey: In lieu of introductions, I thought we should just jump straight into our first experiences on the internet.
Craig Sebenik: My first introduction to the web specifically (not the internet) was in graduate school at Brown in 1991. I had been using Gopher for a number of years and I loved Gopher. It was fantastic, but obviously pictures were kind of a pain in Gopher. Maps and things like that were not generally accessible but textual information was really easy to find because of its hierarchical nature.
One of the graduate students I was working with pointed me at this "brand new" thing called the world wide web. It was Mosaic back in the early days, and this was on an educational network.
Amy Tobey: What kind of machine was it?
Craig Sebenik: I was doing computational chemistry so they were all SGIs, even in those days with 19-20” monitors.
It was fantastic. I was used to Gopher and its hierarchical nature. Moving to the web was very, very different. And I said, "You know what? I'm looking for an email address for a friend of mine that I met at an internship a couple years before." The graduate student said, "Okay, what's the URL?" I'm like, "The what?". She said, "I need to know where to start."
In Gopher, you start at the root, you go to education or, in this case, New York and then you go down the chain looking for the right school. At this point, there were no search engines, and you had to know where to start. Looking back at the time, I was hopeful, but I had to say, "This thing sucks." I was not a fan of the web in the early days.
Amy Tobey: Those of us that didn't really experience Gopher never knew that past. I came in late for Gopher. I hit Usenet early and then I basically skipped over Gopher entirely. Almost everybody I've talked to who experienced their formative time on Gopher tends to go, "Yeah this web stuff just kind of sucks by comparison."
Craig Sebenik: Then WebCrawler and AltaVista came out. I'm like, "Oh, yeah. This makes it better." And then the big one hit—not the big one that most people today would think of—and that was Yahoo.
These college kids published essentially all their bookmarks (and this was back when Yahoo was something like cs.standford.edu/yahoo or something like that.) It was like, "This is fantastic." Much like Gopher, it was organized hierarchically.
Amy Tobey: You were in school at this time for computational chemistry, so how the heck did you get from there to index technology?
Craig Sebenik: I was always programming. I was programming on SGIs and then I transferred to University of Michigan and I graduated/dropped out. Whatever. It's all nuance. But I got my Masters from Michigan and I was, "Now I need a job. What do I know how to do?" At that point I didn't know a lot except how to program. The first job I got was actually a backup person. I would go in on Sundays and change the tapes.
That was a part time gig for three to four months. Then in ‘94, I found a mom-and-pop ISP located in Plymouth, Michigan. It's next to me down from Ann Arbor. There were maybe 20 employees and we were doing web design/ISP. We had a bank of modems that people would dial into.
Amy Tobey: I kind of miss that entry point. That's a very different environment to cut your teeth in, compared to what a lot of corporations are like today. The friends of mine that did it described a shop where you could just basically—if you were getting stuff done—do whatever the heck you wanted. Can you talk a little bit about that environment and kind of how it molded how you approach reliability?
Craig Sebenik: One of the big differences in those days was security. Security was not as big of a deal as it is today. You just did the little bit of security that you needed to, and because there were fewer tools, it was more on individuals to write their own little bash scripts. This would have been before things like ADUser were as common as they are today.
So you had to write your own ADUser script that would update the password file, create the home directory, and change permissions on the home directory from a skeleton. It's so funny seeing what exists today. I wrote something exactly like that. Granted it was nowhere near as resilient as they are today. The number of edge cases that a general purpose tool has to deal with is significantly more than your specific scenario where you get a list of names from the sales person. They’d say, "We just signed this account. You need to add these 20 accounts to the main servers." You created a bash script with the current list of 20. You copied that script to all four servers, and you ran the script. You watched it and it was fine. The idea of centralized logging didn't exist, and the idea of metrics didn't exist in the same way.
The number of edge cases that a general purpose tool has to deal with is significantly more than your specific scenario where you get a list of names from the sales person.
Amy Tobey: I mean we didn't even SSH really. There was SAR and rlogin.
Craig Sebenik: We did do a little bit of SSH at the time but mostly it was Telnet.
Amy Tobey: So we're at the late end of the 90s. That's where I started my career. I was a music major. I wasn't making enough progress on my university parts because as a music major usually you get sucked into all of the ensembles, which are usually a credit or two but they usually want four to six hours a week of your time. I was way behind. And then toward the end I thought, "Oh crap. I've got to start making money at some point." ThenI had some realizations about myself and whether I was really suited to teach in high school or middle school. I did half a semester of CS.
I had learned a bunch of Linux over the summer, so I thought, "Oh, I'll try this CS thing. This is exciting," and was eternally bored. I found a job in Detroit doing systems administration for a stock broker. That was my entry point. But at the time I was still presenting as a white male. I feel like that was a big part of how I got that opportunity. When I look back at it, I keep trying to find those details in the story that I can share with other people for how they can get their start in the industry because systems administration in our experience has been a great place to enter. Do you have any memories of that early time when you were first building your career of things that you did or that you found that really helped you accelerate into where you are today?
Craig Sebenik: At the time most of my educational experience was, as I mentioned, on SGI and Sun (specifically SunOS 4 and then eventually Solaris). The biggest thing was probably just digging around the file system. If your listeners don't know, SunOS 4 was essentially a BSD version, whereas Solaris was System 5. The file layouts are pretty different, especially in those days. This is obviously well before Upstart, or systemd, or anything like that. So just looking at how the boot process works.
Amy Tobey: Nobody does that anymore.
Craig Sebenik: Not really. They assume that systemd will do the right things for them. You can argue that the problems have just moved up from a single host and how it boots up to how a distributed application eventually comes up with all its downstream dependencies being available. Things have moved up from a typical OS up to the service level.
Amy Tobey: At the service level, we're starting at an incredible level of complexity right out of the gate, whereas when you and I started, we were writing init scripts by hand and learning shell along the way. This afforded us the opportunity to have that learning environment where we could make mistakes at a relatively low level of complexity compared to, "Well let's deploy your first service into Kubernetes." There's a billion more lines of code between there and where we were.
Craig Sebenik: Interestingly one of the things that has changed since then is that the opportunity to experiment is much richer now. You can go create some instances on the free tier in AWS or Google of Azure or whatever and play with whatever you want. In those days, trying to find a BSD variant that I could completely screw up and not really piss somebody off was almost unheard of. Fortunately I really cut most of my teeth in academics where it didn't matter quite so much.
Interestingly one of the things that has changed since then is that the opportunity to experiment is much richer now. You can go create some instances on the free tier in AWS or Google of Azure or whatever and play with whatever you want. In those days, trying to find a BSD variant that I could completely screw up and not really piss somebody off was almost unheard of.
Amy Tobey: You had access to gear and if you messed up, it might set your research back but it wasn't like you were taking somebody's profits offline.
Craig Sebenik: It could potentially screw up the lab, but a lab would be half a dozen people. It's not like I'm screwing up the whole chemistry department.
Amy Tobey: I saw a link earlier this morning about among all the job losses that have been happening as part of the COVID-19 and the economic impact, a large number of the jobs that were cut were the same jobs targeted at underrepresented groups. That's what got me thinking about how we can brainstorm ideas to help folks break through into this industry which is a great place to build a career to pay the bills. Now that we're in this world where everything is so complex but we have these opportunities to create small blast radius spaces, what do you think we can do as industry users, SREs, or resilience people who value diversity to create those openings?
Craig Sebenik: There are two sides to the same problem. Let me talk about the latter first because that touches back on this education idea. The thing that you can really do is you can take hot technology and put a lot of effort into learning it. One you just mentioned is Kubernetes. Kubernetes, AWS, there's a handful of different things. Even lower level Docker. You take some time and you really learn those technologies. That gives you a foothold in.
The thing that you can really do is you can take hot technology and put a lot of effort into learning it. One you just mentioned is Kubernetes. Kubernetes, AWS, there's a handful of different things. Even lower level Docker. You take some time and you really learn those technologies. That gives you a foothold in.
Amy Tobey: It gives you a foothold on the technological landscape.
Craig Sebenik: The second part is how people find opportunities is usually via the people they know, not what they know. If you're in an underrepresented group, it can seem like "How can I even break into that group?" That's tough. I can tell you that much like your experience, I was not involved in tech in the same way. Tech was a tool for me. The first company I worked at in California was Network Appliance. Now they are a relatively large tech company. How do you find somebody at those companies? That's a tough one.
A year ago, even six months ago, I would have said conferences. Go to conferences, talk to people. That's difficult. I've been to a couple of these virtual conferences and they are going out of their way to create the same kind of atmosphere where you can network with people, talk to people that you wouldn't normally have an opportunity to talk to. If you don't have a context in the space you're trying to get into, how do you make those contacts? Conferences are the kind of classic way of doing it.
I've been to a couple of these virtual conferences and they are going out of their way to create the same kind of atmosphere where you can network with people, talk to people that you wouldn't normally have an opportunity to talk to. If you don't have a context in the space you're trying to get into, how do you make those contacts? Conferences are the kind of classic way of doing it.
Amy Tobey: I think there are a couple of really modern things that are changing that. The ability to gain visibility and get a hold of people on Twitter is probably the standing hallway track for all of us right now. As you said, all those little conferences are trying to reinvent the hallway track, but I haven't seen one that was really compelling yet.
Craig Sebenik: I did a conference about a month ago and they had made extraordinary efforts to do the conference track via Slack and their own Slack-ish kind of app. Similar to the swag that you get at the conference, they had $10 gift cards or something along those lines to entice somebody to join. If anybody is really curious, it was a Techwall conference.
So the other thing that just occurred to me was to contribute to an open source project. Here's one of the keys: you really have to be passionate about it if you're trying to make a living. I mean you still got to put food on the table, support whoever else you need to support, et cetera. You might do that anyway and break into tech kind of on the side. If you're trying to carve out time, if it's seen as, "I'm doing this for my career,” it will be painful.
Amy Tobey: Why would it be painful to just do it for your career?
Craig Sebenik: Let me rephrase. It would be more enjoyable if you're passionate about the project so it won't seem as much like work. You'll just be that much more engaged.
Amy Tobey: That has a dark side too, which is burnout. I did this to myself over the course of 20 years: be very passionate, be very fired up about technical topics and doing the right thing technically. It has dark sides besides burnout in the people space. The one that is personal is when that passion rides you too high too long, it gets you into a condition where your body starts taking a hit.
That's why I wanted to challenge you on that because I can see how being passionate about getting into tech would be a great motivator to put in the energy to get there quickly. My sister is doing that right now and she is very passionate about it. Think of the case where there's a single working parent and they want to get to a better place in their life. But maybe they just don't give a crap about bits and bytes.
Craig Sebenik: That's fair. So how I would look at it is the passion might change over time. Let's suppose that you have your 9:00 to 5:00-ish job or whatever you're doing to pay the bills and you're trying to learn more about tech because, as you said, this seems like a better path for the future. So you put all of your “passion energy" into tech while you're trying to learn it. You do all this for a couple of years as you're taking classes, as you're getting exposure, as you're networking, and then once you get that job, you then can change your passions to something else.
Amy Tobey: This seems like self-engineering. It's not so much your passion. It's just trying to do mental engineering to attach your mind to this idea.
You're one of the few people from a lab environment that I've talked to who went into more of a traditional route. Most of the time the story I've heard is, "Well, I was the computer person in the lab and I was the only person that really knew how to do it so I ended up doing all the systems administration and now I'm a systems administrator." I've seen a lot of people find that opportunity to break in through just kind of a side channel: there are a bunch of computers sitting there. Nobody knows what to do with them. If you're the one that does, then all of a sudden you have more value to the organization.
Craig Sebenik: It’s like the sysadmin experience back in the 90s, when you could know bits and pieces but more importantly was your ability to troubleshoot and know where to look for the problem. That was more important than “knowing the answer.” That is something you get more from classic CS backgrounds.
To become a software engineer at that point, I don't think I had enough experience outside of computational software. Computational software has its own quirks about the problem of balancing time and space, the classic tradeoffs. It's funny because the code I was working on at Michigan was basically protein folding and we would start a run that would analyze a bunch of data and the run would take roughly a month.
We’d just run on this big Sun4 that we had in "the closet." It would sit there and run and run and crank and crank. I made some small optimizations and it moved from four weeks down to two weeks. The point is, it was about the normal problems in those days.
Amy Tobey: What you just described is the opposite of what you said right before it. Because you said that it's the CS that teaches troubleshooting. My experience has been that CS doesn't teach troubleshooting at all.
Craig Sebenik: I probably said it too fast. CS doesn't teach troubleshooting. CS teaches the normal kind of programming paradigms and design patterns. It made it easier for me to get a more sys adminy type job because the CS kids weren't interested or they would do their programming but they didn't necessarily know what /var or /etc was.
CS doesn't teach troubleshooting. CS teaches the normal kind of programming paradigms and design patterns. It made it easier for me to get a more sys adminy type job because the CS kids weren't interested or they would do their programming but they didn't necessarily know what /var or /etc was.
Amy Tobey: That's still pretty common, less so than it was, but everybody has Max now and does Docker. Even a lot of systems administrators don't know the history of /var.
Craig Sebenik: Or /usr which it still strikes me as funny that it's pronounced user because that's not what it means. I think it stands for Unix System of something. God, what is it? Records?
Amy Tobey: Record, or something. But it's just user. That's all I've ever called it.
Craig Sebenik: There's no “E” in there. So anyway, yeah. So where were we?
Amy Tobey: Just a little aside, how do you say the /etc directory?
Craig Sebenik: I say it both ETC and etcee.
Amy Tobey: I was just curious because it's one of those fun nerd war things. Var, ops, user, etc. I guess that's what we get for being the early players on the internet. Yeah.
Craig Sebenik: It catches up with you.
Amy Tobey: I wanted to spend a little time because of your experience throughout the industry talking about how you see resilience and diversity in the organization as related. I've got really strong feelings on this, but I'll wait to put mine out there because I wanted to give you a chance to say how you see those things intersecting in your professional experience.
Craig Sebenik: First, when you talk about diversity, it's important to keep in mind that there are a lot of different aspects to diversity. The obvious one at the moment would be racial. But there are other aspects as well.
For example, a single mother trying to raise her kids brings a very different perspective to the team, and that perspective is important. Look at the difference between you and I and our academic backgrounds. My background is in more of the hard sciences and your background is in music. Those also bring a different perspective to solving problems and you don't always know exactly how that is going to manifest itself. The more kind of diverse background, again, not just racially or along gender lines...
Amy Tobey: But those axes usually indicate pretty quickly that you are getting a very different experience from the typical white male.
Craig Sebenik: Exactly. Having all of those different mental models. People will approach problems with these specific mental models of the problems and those mental models are almost always formed based on years of experience. Growing up as a white male, my experience is going to be significantly different than say a black female or Hispanic male.
Growing up in Texas, Seattle or Florida, all these different areas have very, very different things that they can then bring to the table. People might see the problem in a way that solves it in a completely different way than you ever could have. Having that diversity means that you have the potential to solve problems in very unique and possibly very resilient ways.
People might see the problem in a way that solves it in a completely different way than you ever could have. Having that diversity means that you have the potential to solve problems in very unique and possibly very resilient ways.
Amy Tobey: One of the big benefits of what you were describing which is the diversity of minds that are available to throw at a problem will create a wider field of perception of things that need to be addressed. Not just troubleshooting, which is kind of where I felt like you were hanging out.
We have these different minds and this guy over here is going to say, "Oh shoot. You know what? I totally know what that is because this one time when I was a welder I had this problem. You know what I mean? The other part is when we're designing products; that's really important. Who are we protecting? Who are we serving? The resilience of the product emerges from the specification in a way.
Craig Sebenik: My favorite example is a video that made the rounds three or four years ago of a hand dryer. The racist hand dryer: essentially you see this pair of white hands go up and the dryer kicks on. No big deal. A pair of black hands go up and nothing happens. White hands come back in, dryer kicks on. Black hands come in, nothing. It sounds a little funny and sad at the same time, but these are the kind of things that when you have a more diverse workforce in general, you solve problems for a wider set of people. When you come down to the team level, as the diversity of your team starts to cover more of the diversity in the world, the more likely it is you're going to be solving problems for a larger percentage of the population.
When you come down to the team level, as the diversity of your team starts to cover more of the diversity in the world, the more likely it is you're going to be solving problems for a larger percentage of the population.
Amy Tobey: We touched a little bit on how we can bring people in as the next generation of engineers, but considering our shared career space along SRE lines, what is the burning thing in your mind that we need to do as a community of SREs to prepare that next generation? What do we need to do for them? What are the things we can do to make sure that they are successful and that we can help them grow into people who will make this discipline even more robust than it is today?
Craig Sebenik: One of the things I mentioned before was finding a particular piece in the technology stack and focusing on that. The obvious problem is that it assumes that you know what those different pieces are. So where do you start? Let's suppose you're a welder and this is not where you want to be 10 years from now. You want to move into software. Other than a bootcamp, if you want to do this completely on your own, where do you even start? Introducing more of the world to the various pieces of technology from a high level gives people a path to the lower levels.
Let's suppose you're a welder and this is not where you want to be 10 years from now. You want to move into software. Other than a bootcamp, if you want to do this completely on your own, where do you even start? Introducing more of the world to the various pieces of technology from a high level gives people a path to the lower levels.
Amy Tobey: The opposite of how you and I did it. We started at the bottom and we've just kind of worked our way up and stayed on top of the heap as we piled more stuff underneath us.
Craig Sebenik: But the thing that has changed since then is the online courses, not just the magnitude of them, but the diversity. There's a couple of large providers (without trying to make a plug to anybody): EDX, Coursera, and of course the big universities.
Amy Tobey: And a bunch of stuff on YouTube that's free.
Craig Sebenik: The difference with YouTube is it's a little bit more of a free-for-all and with the education platforms, there are ways to say, "I want to learn more about tech,” and it'll present you some introductory courses.
Amy Tobey: So they have a path you can follow as opposed to the wilderness of YouTube.
Craig Sebenik: That said, if people are interested and they're coming from nowhere, the thing that I would point them at would be this class called CS50 from Harvard.
CS50 is a fantastic course. He starts from very, very simple principles, and then goes into more and more detail. I forget his name, but he has created a couple of different variants over the years. There's an AI CS50. If you have no idea about tech in general and you just want to start the software, that is a great starting point.
If you're looking for where to start in SRE, the Google class is not bad but it does a lot more from the cultural aspect and less from the technology. They won't say, "Hey here's where the Kubernetes pot is." Or, "This is how you docker exec." But that can make it easier for people to jump into it without having this fear of, "I don't understand what this is." Since it's a softer introduction technically, it can make it less of a barrier for people.
Amy Tobey: There's a Google series of courses with a certification at the end that I just retweeted yesterday. They have a whole program you reminded me of when you were talking about the Harvard thing.
Craig Sebenik: One thing I have been a proponent of for a while has been this idea of getting SRE as a concept into universities. Now when I say universities, that's probably a bit of a stretch, but you see a lot of people who learn SRE essentially on the job. I learned everything on the job. The job for me was an academic environment, so it was a little bit less pressure, but I learned by doing essentially.
Amy Tobey: It's a practice. You can learn all the facts you want about it, but the actual work is a practice as opposed to dishing out facts. They're just useless facts until you put them into work.
Craig Sebenik: I would like to see that introduced into an academic environment. Academic is more of a broad sense. I say university because I don't have a better perspective. Even things like boot camps would be awesome. I'd like to see not just people go into SRE per se, but even to give normal developers background in SRE that just makes everybody's job easier.
Amy Tobey: I noticed that the teams that I work with that have somebody with some ops experience in them are generally not that much work because there's somebody already sitting in all of the standups and the planning.
There's that centralized model where we try to say, "Well we can't put an SRE in every team." So how do we get these heads together and out into all these engineering teams? I like the idea of educating everyone on these concepts so that we have that spread of SRE knowledge more naturally than we do today.
Craig Sebenik: Along those lines, I gave a talk a couple of years ago at SREcon about educating SREs. I had three points: How do you educate new SREs? How do you educate developers with SRE topics? How do you educate SREs especially in a corporate environment over the long haul? The latter has to do with how tooling within a specific environment adapts and changes over time and keeping people up to date on that. That isn't just for SREs, but your engineers as a whole.
How do you educate new SREs? How do you educate developers with SRE topics? How do you educate SREs especially in a corporate environment over the long haul? The latter has to do with how tooling within a specific environment adapts and changes over time and keeping people up to date on that. That isn't just for SREs, but your engineers as a whole.
Amy Tobey: I mean a lot of people's personal experiences is that individual computers are very reliable. We live in a golden age that's terrible for that reason.
Craig Sebenik: Amazon or Google or Azure will just handle that for me. Sure.
Amy Tobey: I often lament the battle days of Amazon, early in the EC2 days when you didn't need Chaos Monkey because the instances would just die randomly. You’d come in in the morning and 10 instances are dead. You’d go, "All right. I'll replace my shell scripts." Now we have to actually make failure happen. We're too reliable. Maybe that's what's drying up some of the opportunities for education; that the things we experienced maybe are less common now.
Craig Sebenik: Or maybe those opportunities are essentially moving up the stack. As those things become more reliable, then you have to worry about other things being less reliable.
If you liked this, consider checking out these resources:
- Resilience in Action, Episode 1: Narratives in Incidents with Lorin Hochstein
- Resilience in Action, Episode 2: Adaptability, ego, and scaling with Tim Banks
- Resilience in Action, Episode 3: Inclusion and Integrity with Sidney Miller