Are you looking to start an SRE team or add to your existing team? We explain the SRE hiring process and how to find and evaluate an SRE.
What should I consider when hiring an SRE?
In SRE hiring, candidates should be evaluated on their knowledge of SRE processes, coding skills, understanding of operating systems, and ability to foster SRE culture and principles.
Site reliability engineers are crucial to the development team since their role combines engineering and operations. So if you’re looking for ways to improve the overall customer experience for your solution through more efficient development and reliable operations, SRE hiring should be on your radar.
You can recruit SREs by reaching out to them directly on platforms such as LinkedIn or working with specialist recruiters to help you find SREs for your team. SREs can also be developed in-house. Promoting your existing developers to SREs can be helpful, as they’ll already be familiar with your systems. Evaluate potential SREs in-house by looking at how they exemplify the qualities you’d want in an external hire.
What is the difference between SRE and DevOps engineers?
DevOps engineers focus on automating repetitive operations within the team, while SREs concentrate on analyzing and improving the current infrastructure to improve reliability and availability. However, before embarking on SRE hiring, you should take some time to distinguish the role from other roles in the development team and what you want them to add. It’s easy to confuse DevOps engineers with SREs, so differentiating the two from the start will help.
SRE vs. DevOps is a topic that comes up often. A DevOps engineer is great when you’re looking to deploy updates and fixes. However, they are also needed when seeking more advanced technical support. While
What should the SRE job description include?
The SRE role encompasses several elements, so the job description must reflect that.
The main part of the SRE responsiblity is to work on the availability and reliability of the solution. This can take many forms with specialization, from making sure that the software is coded with reliability in mind, to building infrastructure, to leading incident management.
SREs will also be expected to collaborate with stakeholders and team members in their quest to minimize and prevent downtime. Another component of the SRE job description is improving internal processes using automation and other methods. That could look like streamlining development processes to deploy to customers faster through automating software delivery. The SRE job description should also highlight their role in mitigating risks where possible, including any cybersecurity issues that could lead to downtime.
Ultimately, their role is about ensuring the customer has the best experience possible while using a product or a service when it comes to availability and reliability.
When writing the SRE job description, you can split the role out for day-to-day and long-term responsibilities. For example, their day-to-day tasks may consist of collecting and analyzing metrics related to reliability and availability, collaborating with the team for improvements, and designing and testing automation based on what they’ve found in their analysis.
Long-term, they will aim to minimize downtime as much as possible and balance development velocity and reliability against service level objectives.
Qualifications and skills will vary based on seniority level, but at a minimum, SRE hiring should be focused on individuals with strong programming abilities and experience with distributed storage technologies and resource management frameworks. However, depending on what you hope for your SRE to do, they may not do too much programming themselves. As long as they can understand your systems well enough to contribute to them, they can provide value.
What are some of the key interview questions to ask in SRE hiring?
When hiring SREs, it’s crucial to look at their skillset against the job description and what they can offer. However, you should also have clear requirements in mind of what SREs will do in their function that you can discuss in the interview.
Some example questions you could ask might include:
- How big was their SRE team and what was their role in it?
- How they look at system health as a whole and their role in running production environments. What do they prioritize and why? What do they do when there are conflicting priorities?
- Their experience in building software and solutions for reliability and availability, including any collaboration experience they have with other team members
- Their overall experience in improving reliability in software solutions and any metrics they can discuss
- Explanations into process design and capturing historical information they can speak about (e.g., runbooks, process documentation, retrospectives)
- Examples of previous analyses and examples will also help understand how they prioritize metrics, what they are looking for, and why
Additionally, you could also consider behavioral scenarios (e.g., how would you act in this situation) to get a sense of how they approach problems and SRE culture as a whole. You can use examples of previous situations to understand how they would have gone about solving the problem.
During SRE hiring, you should also consider what soft skills the person is demonstrating and how that contributes to their work.
Key traits to look for during SRE hiring include:
- Problem-solving skills, including outcomes
- Analytical skills
- Behavior under pressure
- Empathetic thinking towards other peoples’ situations
Are they comfortable thinking creatively and coming up with novel solutions? Do they need a lot of guidance or prefer working on their own? How have they collaborated with other team members in the past, and what was the result of that collaboration?
As you develop your SRE hiring practice, you can use these questions as a starting point. Combining these questions with more specific questions based on organizational experience can help you better understand how potential candidates will work in your particular environment. These questions also give potential candidates the opportunity to speak about their past experience and the results they have achieved.
You can tailor the questions as needed, but knowing what skills you’re seeking will help you personalize the questions for the interviews. The goal of the interview is to get a sense of how they understand and interpret SRE principles and what they can bring to your organization if hired.
Blameless can help new SREs get up to speed by providing a library of knowledge through retrospectives and reliability insights. Our incident management systems help new SREs help faster. Want to see how? Check out a demo!