Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

The Untold History of Women and Incident Management

Emily Arnott

Originally posted on VMblog

The history of women in computer science is often understudied and underappreciated despite being fundamental to the history of computer science as a whole. As far back as the 1700s, women have been integral to major computational projects on the forefront of science. From astronomical calculations, to Ada Lovelace writing the first algorithms for the first true computers, to teams of women at NASA programming flight trajectories, women have been the foundation of the evolution of the field.

Although women’s contributions to computer science have been better recognized in recent years, they have gone unappreciated for the vast majority of the profession’s history. It’s an all-too-common story: skills and knowledge that were considered “women’s work” aren’t seen as legitimate or impactful until men start getting involved.

These days, the tech industry is evolving again in the direction of greater recognition. For too long, the culture around development and operation of software was the former “throwing code over the wall” to the latter. The operations teams had to scramble to learn how to run and fix the code given to them, while the development teams often basked in the spotlight of their successful launch. Movements like DevOps and SRE try to tear down these silos and provide the operations and incident management teams the support, and recognition, that they deserve.

To celebrate Women’s History Month, we’d like to take a look at the gendered dynamics of this often under appreciated work. Why are women often the ones we look to when things go wrong? Why does that work so rarely get the applause it deserves? And what can we do better?

Women’s Glue Work of Incident Management

In Tanya Reilly’s landmark article Being Glue, she breaks down how many people’s software development jobs are often dominated by tasks that aren’t exactly development: coordinating meetings, smoothing processes and setting standards, reviewing and correcting others’ work, and much more. Despite how hugely important and necessary this work is to the business success, because it doesn’t slot neatly into metrics like “lines of code contributed” or “projects launched”, focusing on this “glue work” can end up detrimental to your career. Unsurprisingly, it is often women that end up doing this thankless and non-promotable work.

We talked before about how the role of an SRE is essentially defined by glue work, and how these contributions should be formalized, recognized, and rewarded. To go more specific than SRE, the area most crucially bound together by glue work is incident management.

Like development, there’s a division and perhaps an assumed hierarchy between technical and non-technical work. One group of people is tasked with actually diagnosing and implementing a solution: combing through the codebase to find bugs, standing up more servers and allocating load, whatever it takes to get the site healthy again. Another group handles logistics: are all the necessary people talking to each other? Are stakeholders informed without distracting other responders? Is information about the incident being collected and logged?

These two groups are equally important to resolving the incident, but too often the technical team can end up getting the spotlight for doing the “actual work”. And too often, it’s women stepping in to handle this essential work of frantically gluing together the entire incident response process. Like Tanya Reilly recommends, glue workers for incident management need to advocate for themselves, and have space to be recognized and supported, with titles like “incident commander” and formalized processes that highlight their achievements.

With this hope for the future, let’s look to the past and shine a much-needed spotlight on some early stars of incident management.

Incident Management from Old-School Women

Formalized incident management processes, with proactively built runbooks, collaboration with development teams, and all sorts of other nice things, is a relatively recent innovation. Before that… there wasn’t much. Network operation centers and the ITIL are some of the earliest attempts to structure incident management for software. They only date back to the 1960s and 1980s respectfully. That may seem very old, but there were decades of challenging and important software development before that.

But as much as things have changed since then, there is one truth that stayed consistent: things will go wrong. Incidents are a reality of software development and operations in the past, present, and future. In fact, in the days before modern computing, when programmers had to commit code to punch cards, incidents were even more common and often an even bigger pain to debug and fix.

Unfortunately, the early history of incident management is obscured twice over. Once by patriarchal biases discrediting the efforts of women in ensuring computational accuracy, and again by downplaying efforts to maintain compared to efforts to design and release projects. It is truly an untold history. Let’s look at some examples of women who made it into the history books, knowing that they represent just the tip of an iceberg that runs deep with meaningful, but hidden work.

Edith Clarke

The first professional electrical engineer in the United States, Edith Clarke made major contributions to building the modern electrical grid. We cannot say when electricity would have been so widespread and reliable across the continent without her efforts.

It’s worth noting that many of her most significant contributions came from a perspective of anticipating and better handling incidents. She invented the Clarke calculator, a device that could more quickly solve hyperbolic functions than any other method. This was used to quickly and reliably calculate current, voltage, and impedance measurements across power lines, allowing for more consistent arrangement.

In 1926, she was the first woman to present a paper at the American Institute of Electrical Engineers’ annual meeting. She covered calculating the maximum load a given power line can handle without instability. This foundational work allowed for the reliable installation and usage of much longer lines than before. She prioritized this sort of rigor: not racing to put down the longest lines imaginable, but ensuring that the network they were building was stable.

Dorothy Vaughan

Facing intersectional discrimination as an African-American woman, Dorothy Vaughan nevertheless rose to prominence as a human computer. Human computation pods were used in World War II to calculate flight paths and rocket trajectories. In the aftermath of World War II, she was promoted as the acting head of West Area Computing, an all-African American computing unit and worked with NACA, the agency that would become NASA.

Think about the responsibilities of running a computational agency. Large computational problems are assigned, then broken down into component problems and distributed to individuals. Each individual performs some manual computations, then elevates their results up to be combined with others. There’s many possible sources for errors which must be stopped from accumulating.

Just as difficult as the calculations themselves was this coordination glue work, which must both be procedurally consistent and dynamic enough to account for errors from any source. For handling this responsibility in such an archaic era on such critical missions, Dorothy Vaughan is an early hero of incident management.

She also achieved a meta-resilience, anticipating and adapting to the evolving field. As electronic computers started to gain prominence over manual computation, Dorothy made sure to train her staff in programming early. This is another form of anticipating and managing the possible “incident” of obsolescence, handled in a very prophetic way.

Grace Hopper

One of the better known women legends of computation, with many achievements, awards, and merits, Grace Hopper is still underrated in her contributions to incident management. Few people are as foundational to how we think about computers working, but also computers breaking.

In the early days of programmable computers, designing and building the hardware was considered “man’s work”, whereas programming the computer itself was considered softer “women’s work”. But in those days, hardware and software weren’t as distinct, and making code that worked required a deep understanding of the physical wiring of the machine itself. When things went wrong, as they often did, women programmers were often required to debug and repair every aspect of the system.

Perhaps no story better illustrates this than when in 1947, a moth flew into a Harvard computer that Grace Hopper was working on. This “actual bug” popularized the use of the term to refer to a computer error, and in turn, the concept and term of “debugging”.

Beyond being among the first pioneers of this foundational concept to the technical side of incident management, she also knew the value of empowering people when things go wrong. In a 2012 interview, she said “the most important thing I’ve accomplished, other than building the compiler, is training young people.”

We hope these stories of women handling the incidents of early computing were inspirational. There’s many more to find if you explore. Remember that the value we gain from these stories is because these women were recognized and their accomplishments recorded. We must keep striving to record and retell new stories to inspire the next generation of incident management.

Book a blameless demo
To view the calendar in full page view, click here.