Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

Google Cloud OnAir with CEO Ashar Rizqi: Benefits of Cloud Infrastructure

CEO Ashar Rizqi had the pleasure of being a guest on Google Cloud OnAir, a Google Cloud Customer Interview Series. Ashar and interviewer Jimmy Sopko discussed how Blameless has extended our runway using Google Cloud and Google Kubernetes Engine and how the team cultivates a culture of site reliability in a changing world.

TL;DR

  • Leaning in to a cloud native strategy: Software is eating the world and industries are being disrupted. Underneath it all, modern applications have completely changed. It's creating complex interdependencies between these different modules, especially when you're running inside in a hybrid environment. So it really comes down to managing complexity. At the same time, user expectations are increasing exponentially around security, reliability, performance, and availability. But what's not changing is your production resources, such as the people that you have available. You need to have infrastructure on which you run your applications. There needs to be a way in which you deploy those applications, test those applications, monitor them, operate them, roll things back when needed, and manage change at an increasingly rapid pace. Historically, what used to happen is you had specialized teams, which are very expensive, managing infrastructure. That's money and resources that can go towards the core of your business and product innovations. You don't need to go solve the same problems again and again. Cloud native is a must-have in today's day and age to be able to do anything at the speed and scale at which users are expecting today. You just can't exist without it.
  • Cloud versus building in house: The best thing that you can do for your company is to say no to certain things. It elevates your credibility, and it shows focus and commitment to a real problem. For Blameless specifically, when my co-founder and I started, we were fortunate enough to raise some venture money, but it's limited. It's there for a certain period of time and every dollar counts. It takes energy, but you have to think about, “Where's my investment going to yield the highest ROI?”. In the earliest days of the company, it’s about getting product-market fit quickly as quickly as possible, and that requires rapid experimentation. It's a question of how do you move very, very quickly, but still build for this enterprise. That was the guiding principle for us. Making our platform secure and meeting base-level compliance requirements was very important. And what that means is that you're not moving as quickly now that you have limited engineering and product resources. That's effort that's not going to the product.
  • Advice to other startups: The key is SaaS economics. You have to break SaaS economics down into its constituent parts and look at each one of those things very, very closely. For example, our architecture was single-tenant. That’s expensive. So we prioritized this re-architecture effort that we've been planning for two or three quarters to move to multi-tenancy and optimize our spend and consolidate across instead of running in multiple cloud providers. The other thing is, you really have to go back and look at your head count and operating plan to make sure that you're questioning every single one of those. We did a lot of scenario planning on our side. We went back and we looked at essentials; do we want to cut down to the muscle, to the bone? Where do we want to draw the line? The number one thing is that you have to move quickly and you have to make decisions. No decision is likely to be the one that kills you. Not taking any kind of action is really the thing that's going to hurt the most.

See the full transcript of their conversation below, which has been lightly edited for length and clarity. To download the full recording, check it out here.


Jimmy Sopko: Can you tell me a little bit about yourself, your background and what Blameless does as a business.

Ashar Rizqi: I'm an immigrant. I grew up in the Middle East, and moved to the United States in about 2006 for my undergrad degree in electrical engineering from Texas A&M. After graduating, my first job out of college was working as assistant admin at Fidelity Investments in financial services. And quite frankly, that wasn't really the career path that I wanted to go down, but it was really what was available to someone that's an immigrant, trying to get a job in a post-recession economy that's just trying to recover. 

I really wanted to do more things in the semiconductor space because I felt that was real engineering; hardware engineering and all this software stuff was just too fluffy for me. 

The rest is really history. I worked in traditional enterprise IT as a sysadmin for a couple of years, then worked out of a data center. I was very fortunate to see a mix between legacy practices and some of the new DevOps practices coming in. Then the next two roles for me were working at fast growing companies like Box and MuleSoft, and seeing them through some major financing milestones like IPOs. I saw how the sausage gets made from an infrastructure standpoint and was in a SRE role and a platform role in both those companies. Through those experiences I met my co-founder, Lyon, and we decided to start a company because of these macro-level trends that we saw around Blameless.

Jimmy Sopko: I always find it fascinating how founders use their different experiences to come up with the concept and eventually the business plan and actual business. But one thing that's always really sparked my interest with Blameless is the name. So where did that company name originate?

Ashar Rizqi: Credit for the term Blameless; it's been taught prior in literature by folks like John Allspaw and others. But in one experience I saw how having a culture of blamelessness really allows you to move quickly and create this culture of taking risks and making mistakes. The most important aspect of making mistakes is always making sure that you're learning from them. That's really what our mission is at Blameless: to move teams to a culture of resilience. SRE practices and principles are really core to that. It's been around for over a decade, and what we've done is we've created a platform that operationalizes those SRE principles and practices. If a company starts to adopt these practices, they inherently start operating as a blameless culture.

The most important aspect of making mistakes is always making sure that you're learning from them. That's really what our mission is at Blameless: to move teams to a culture of resilience. SRE practices and principles are really core to that.

Jimmy Sopko: With that in mind, how did that influence your cloud native strategy as you were starting Blameless and up to now?

Ashar Rizqi: It's all about looking at the macro perspective. There's this worldwide trend where software is eating the world and industries are being disrupted, and that's creating this need to move fast and provide fast, reliable experiences. But the challenge is that underneath it all, modern applications have completely changed. And the way that they're built and architected and operated is constantly changing. It's creating complex interdependencies between these different modules, especially when you're running inside in a hybrid environment. So it really comes down to managing complexity. The nature of a complex system is that it's always in a continuous state of brownout, where you can’t get complete visibility into what’s happening, so it's all about your ability to respond and fix issues quickly and prevent those issues from happening in the same place.

Modern applications have completely changed. And the way that they're built and architected and operated is constantly changing. It's creating complex interdependencies between these different modules, especially when you're running inside in a hybrid environment. So it really comes down to managing complexity.

User expectations are increasing exponentially around security, reliability, performance, and availability. But what's not changing underneath it all is your production resources, such as the people that you have available. You're not going to be able to hire 3000 SREs tomorrow the way a company like Google does it. They're very fortunate to be able to do that. But if you're just a startup, you're just working with very limited resources, but the expectation your users have is the same expectation that they have from a company like Google, which provides phenomenal uptime and availability. So how do you actually get there?

Being cloud native is actually a very, very core part of that because you need to have infrastructure on which you run your applications. There needs to be a way in which you deploy those applications, test those applications, monitor them, operate them, roll things back when needed, and manage change at an increasingly rapid pace. As a small company like Blameless, we don't have to worry as much about security  or compliance; we can just sign a VA with one of these infrastructure providers and get 30%, 40% of the work required for HIPAA.

For example, if we're trying to go after healthcare, the underlying cloud companies already have a lot of that built in. Historically, what used to happen is you had specialized teams, which are very expensive, managing infrastructure. That's money and resources that can go towards the core of your business and product innovations. You don't need to go solve the same problems again and again. Cloud native is a must-have in today's day and age to be able to do anything at the speed and scale at which users are expecting today. You just can't exist without it.

User expectations are increasing exponentially around security, reliability, performance, and availability. But what's not changing underneath it all is your production resources, such as the people that you have available.

Jimmy Sopko: I'm biased, but I agree 100%. You talked about focusing on your core business. I was at two startups before I was at Google Cloud. The hardest thing we had to do was say no, and that was probably the thing we spent the most time trying to do. I'm very curious about some of the trade-offs that you had to make. Obviously there's a trade off of offloading some of your infrastructure to a cloud provider versus building your own. But even within the different layers of the stack when using a cloud provider, I'm curious what types of trade offs you were thinking through and making in the early days as you were scaling.

Ashar Rizqi: The best thing that you can do for your company is quite frankly to say no to certain things. It is the thing that elevates your credibility, and it shows focus and commitment to a real problem, not everything else. For Blameless specifically, when my co-founder and I started, we were fortunate enough to raise some venture money, but it's limited. It's there for a certain period of time and every dollar counts. It takes energy, but you have to think about, “Where's my investment going to yield the highest ROI?”.

In the earliest days of the company, and any startup goes through this is, it’s about getting product-market fit quickly as quickly as possible, and that requires rapid experimentation. The caveat for us is that we're building a product that's very specifically focused on for mid-market and enterprise type companies, which means that there's a certain reliability, security element to it as well. If you were building a consumer application, you can be a lot more relaxed about decisions that you make around poor reliability because those users are likely to come back. With the enterprises, you only get one shot, maybe two at most, if you're really lucky and you've really got to make those count.

Then it's a question of how do you move very, very quickly, but still build for this enterprise. That was the guiding principle for us. So making our platform secure and meeting base-level compliance requirements was very important. And what that means is that you're not moving as quickly now that you have limited engineering and product resources. That's effort that's not going to the product.

In terms of managing infrastructure, we made an early decision to say, "Hey, we're going to try and operate agnostic of the infrastructure, and really try to take advantage of the native platform capabilities that a lot of the cloud providers were providing." We made this decision to go with Google GKE. That kind of gave us this balance between trying to move quickly and standardizing the way in which we build and run our applications. But at the same time there's this whole notion of running multi-tenants versus single-tenant.

In the earliest days, we decided to say every single customer would get a single-tenant hosted instance of Blameless, which meant that they would get their own Kubernetes cluster, their own database instance, etc., which is actually quite costly. But what it does is it reduces the friction involved in getting through their security and compliance approvals. It creates some very clean separations. Of course, Kubernetes today provides a lot of these separations inherently inside the platform, so it's become moot. Now we're taking advantage of becoming a multi-tenant platform. But in the earliest days, it was all about moving quickly, making those decisions, and getting across the finish line so we could unlock product-market fit as quickly as possible.

If you were building a consumer application, you can be a lot more relaxed about decisions that you make around poor reliability because those users are likely to come back. With the enterprises, you only get one shot, maybe two at most, if you're really lucky and you've really got to make those count.

Jimmy Sopko: We know the world has changed in the past few months. Four months ago we would have been doing this in person in a studio. Can you talk to us a little bit about the challenges that you and Blameless have faced and how those challenges have impacted your strategy?

Ashar Rizqi: Quite frankly, what it did for us is that gave us a tremendous amount of focus. At the macro level, it created uncertainty, but the good news is it created uncertainty for everybody in the world. It's like just getting a blanket pass. We were able to say, "This is the time that we're going to go and invest in this particular area of the business or product that we've been neglecting." That's one aspect of it. Growing very quickly usually comes with an impact to the runway and the burn rate, and we had certain plans in place.

Obviously we had to tweak those plans as an immediate reaction to COVID to say, "Okay, let's steady the ship. Let’s make sure that we have enough runway to last about two or three years. Then let’s go back and revisit a lot of the assumptions and strategies that we had put in place." This is all happening real time. The world is reacting and shifting literally day by day, week over week. I'm sure you've kind of been seeing the stock market has been going haywire. It's just really hard to see any kind of pattern emerging there.

But what [the constant change] does for us is inspire us to hunker down, make sure that our foundations are very strong, and then focus on the things that really matter both on the product side and go-to-market side. Really, the credit goes to my co-founder here; we decided to shift a lot of our investments into digital strategy. We were very much focused on events, for example, and had a very large events budget. We said, “Let's revisit that, shrink that down dramatically, and get creative around our digital strategy: digital events, digital outreach, etc.”

On the product side, it was a great opportunity to focus on increasing quality, product usability, polish, and these kinds of things that often get put into a tech debt bucket. This gave us a lot of breathing room. We're now in a good spot. I'm happy to say our demand has gone up because every other company out there is now forced to think about their digital strategy and moving their businesses to this digital world. If you're trying to do that, you’ve quickly realized how important reliability, quality, security are and systems out there that can help you achieve that.

But what [the constant change] does for us is inspire us to hunker down, make sure that our foundations are very strong, and then focus on the things that really matter both on the product side and go-to-market side.

Jimmy Sopko: I think everyone's in it globally, which makes us think a little bit differently. It took people a month about to just orient themselves in this new world and collect a bunch of data, try to do some analysis, and make some decisions. I'm really glad to hear that there's focus and things are turning around. Based on your experience, what advice would you have for other startups looking to solve some of the problems that you're solving with respect to the VCs getting more conservative around making sure that there's a viable business and SaaS economics as they look to invest in companies?

Ashar Rizqi: The key is really what you just said, which is SaaS economics. You have to break SaaS economics down into its constituent parts and look at each one of those things very, very closely. I'll give you our example. Of course COGS is an extremely important part of that. Our architecture, as I mentioned, was single-tenant. That’s expensive. So we prioritized this re-architecture effort that we've been planning for two or three quarters to move to multi-tenancy and optimize our spend and consolidate across instead of running in multiple cloud providers. We said, “Let's take advantage of some of the startup programs that they may be offering. Let's go into a deeper partnership with them and take advantage of those relationships.

The other thing is, you really have to go back and look at your head count and operating plan to make sure that you're questioning every single one of those. We did a lot of scenario planning on our side. We went back and we looked at essentials; do we want to cut down to the muscle, to the bone? Where do we want to draw the line? The number one thing is that you have to move quickly and you have to make decisions. No decision is likely to be the one that kills you. Not taking any kind of action is really the thing that's going to hurt the most.

Again credit goes to my co-founder. Lyon went back and rejiggered our go-to-market strategy and our positioning to say, "Okay, in this post-COVID world where we're in a state of very limited resources, let's pretend that we're not going to raise funding for the next two years. How do we want to change our positioning? How do we still continue to drive the business? Let's set an even more aggressive goal, a 10X goal with the resources that we have today."

We also asked every single vendor that we were working with for help, because they're all running into the same challenges themselves. They have bills to pay as well. It was like, “Hey, how do we get more flexible amounts with the payment plans that you can put us on, or are they the ways in which we can defer or change the payment structure in such a way that just really helps us get through?”. And you know what? It's not something that's very specific to COVID. These are all strategies that you can apply at any point in time. It's all about being able to manage these things carefully.

The number one thing is that you have to move quickly and you have to make decisions. No decision is likely to be the one that kills you. Not taking any kind of action is really the thing that's going to hurt the most.

Jimmy Sopko: Taking care of employees is I know something that's really important to you and Blameless. And so I'd love to hear what steps you and Blameless have taken internally to empower employees working remotely during this time?

Ashar Rizqi: This is one of my favorite topics, Jimmy, mostly because it hits so close to home when it comes to culture and blamelessness and uncertainty and fear, and how to manage that kind of environment, which again is amplified because of COVID. As a company, we face that. We're going to face something like that at different points in our lives and the lifespan of the company. Employee safety is the number one thing to optimize for. As soon as we started getting some signals from the state governments, we instituted an immediate, 100% work from home policy. We're going to be as flexible as we can with equipment that we can give people to help them work remotely effectively.

But the good news for us is it was an easier transition than most because from the beginning, we actually had a pretty flexible work from home policy. It stemmed from this principle that talent is becoming more distributed. If we want to hire the best people, let's give them flexibility to operate around what's important to them: family life, moving to a lower cost area where they can raise their families, and stuff like that. We've got to optimize for getting the best talent. And obviously the macro-level culture is changing. Immediately after that, because it was still relatively new that 100% percent of the company was going to work remotely, we had this intense focus on optimizing performance for everyone and making sure that we still had a way to collaborate.

Employee safety is the number one thing to optimize for. As soon as we started getting some signals from the state governments, we instituted an immediate, 100% work from home policy. We're going to be as flexible as we can with equipment that we can give people to help them work remotely effectively.

Jimmy Sopko: Talking to your team and talking to you as we've been working together, something else that came up a few times was how you have adapted to actually use Blameless to help you guys accelerate and adapt in this change. Can you talk a little bit about that?

Ashar Rizqi: We're a product for engineering teams built by engineering teams. One of the first principles is that we must dogfood our own product ourselves. Dogfooding isn't just about testing better. It's about ingraining the product and the practices into our day-to-day lives. That means that we have to treat ourselves as customer zero. We have to treat ourselves with the same level of importance and respect as we do other customers. I know it's hard to do because it's us. We're a team. We really have to level ourselves up to the same level as a customer.

Sales teams start to do incident retrospectives and postmortems; if there's a closed-loss opportunity, let's treat that as an incident. There's a lot of those practices and principles that make us operate more effectively and efficiently with the limited set of resources that we have. It's been pretty amazing to see those systems start to be applied in other parts of the business, extending it beyond a product and engineering practice.

My personal principle is that no customer should ever come to us with a problem that we're unaware of. We should never be caught by surprise. Dogfooding is really a key part of that, making sure that at a minimum, if a customer raises an issue with us, we're already aware of the problem. That sort of gives them a sense of confidence that this team really has its stuff together.

Dogfooding isn't just about testing better. It's about ingraining the product and the practices into our day-to-day lives. That means that we have to treat ourselves as customer zero. We have to treat ourselves with the same level of importance and respect as we do other customers.

Jimmy Sopko: What success stories have emerged from this experience, whether it be with your teams or your customers, or even your own personal journey over the past few months?

Ashar Rizqi: The biggest success story for an earlier stage company like us, is that a lot of the Fortune 100 companies are looking at us saying, "This company knows or has a well-formulated opinion, has a position of leadership that they're establishing, so let's bring them in and partner with them." 

They don't see us as a simple tool that they're just going to come and plug in. They're saying, "Help us change." This is the biggest thing in my opinion. A lot of the CTOs that I talked to in these Fortune 100, Fortune 500 companies, are saying, "We have this desire. We know that just throwing tools at the problem is not going to solve it, but please come and help us change the culture. We don't know how we're going to do it. We think culture is a fluffy thing, but we understand the importance of it. So come and show us the path." 

The second thing is that the approach we take when launching these new ideas and products is an industry collaboration effort. We don't just sit in a corner somewhere and come up with ideas and then go build a product around it and launch it. We're very, very active in the SRE community working with thought leaders and understanding the problems that they're encountering. Then going back to them with solutions. So it's a multi-pronged, multi-phased approach where we meet with many different people. That's helped us establish credibility because it makes people and teams feel like they're really being listened to, really being heard. That's a really important strategy for startups, but I think that's a big success story that's emerged for us as well.

A lot of the CTOs that I talked to in these Fortune 100, Fortune 500 companies, are saying, "We have this desire. We know that just throwing tools at the problem is not going to solve it, but please come and help us change the culture. We don't know how we're going to do it. We think culture is a fluffy thing, but we understand the importance of it. So come and show us the path."

Jimmy Sopko: What would you recommend other startups do to get to that point of focus? You seem to have a very, very focused target and are quickly figuring out the value that you can bring to these customers. So how would they go about getting that?

Ashar Rizqi: The most important thing is knowing when to say no and knowing that it's okay to say no. The other thing that's been extremely helpful for us is the landscape has changed quite a bit where partnerships weren't given a lot of importance back in the day. It was always considered a distraction. It’s like, “If we're partnering with midsize companies of similar stage startups or large companies, what's the incentive for them, what's the incentive for us.” It takes a lot of energy to align those things. Maybe five plus years ago, it was something you didn't even do until you reached a certain stage.

The world is shifting remote. Digital services are becoming increasingly important and it's all about getting your digital services into the hands of users and customers as quickly, safely, securely, and reliably as possible. There's a huge cost that goes into doing that from scratch. When you've got partners like Google or Microsoft or whoever—channel partners, API partners, third party teams or products that you're already integrating with or products that you're already integrating with—start leveraging those relationships. It's not just about top line revenue anymore. Of course, I think that's a side benefit, but it is about thought leadership and seeing/ being seen as a thing that's modern and always evolving and changing. The only way you do that is through an ecosystem strategy.

The world is shifting remote. Digital services are becoming increasingly important and it's all about getting your digital services into the hands of users and customers as quickly, safely, securely, and reliably as possible.

One piece of advice I would give is focus on the ecosystem strategy early on. It's relatively cheap because a lot of the teams that you're going to be partnering with already have defined processes, teams, and resources behind them and are incentivized to make those partnerships happen. Another piece of advice is go to your cloud provider. We're spending a lot of money on our cloud provider. Let's actually explore a partnership. There are five programs that we have that Google provides that we're sort of starting to take advantage of. Or say, “We have this deep integration with Slack, with AppDynamics, or New Relic. We're sending a lot of traffic your way. Let's actually turn that into something that's mutually beneficial.” I guarantee that there's going to be somebody on the other side who's either thinking about it or is definitely willing to engage and respond.

One piece of advice I would give is focus on the ecosystem strategy early on. It's relatively cheap because a lot of the teams that you're going to be partnering with already have defined processes, teams, and resources behind them and are incentivized to make those partnerships happen.

Jimmy Sopko: As a platform, you guys enable their solutions, and you're the foundation. So it makes a ton of sense. Something I've come across is that every startup has to hire a sales team and that's just more people calling on these enterprises. So the more that you can try to have other people help you get to that starting line, the better. It's something that we are seeing at Google Cloud. We are investing heavily in our partner ecosystem. And Blameless is part of that partner ecosystem. We are working on deals together with some of those big Fortune 100s you talked about. Very exciting. What can your customers get excited about and expect from Blameless in the near future?

Ashar Rizqi: I think we have a phenomenal engine around creating very exciting and very new and unique content. And the reason I bring that up is because a lot of folks that are thinking about SRE and want to learn. That's been a big area of focus for us. The other thing is that we've been fortunate enough to be connected with some of the most forward-thinking technical leaders in this space. We recently did an interview with Melody Hildebrandt. She's the Executive Vice President of Security and Engineering at Fox. It's just amazing to be able to tap into those experiences from people and put that on a platform that's accessible to everybody else out there.

From that perspective, it's going to be extremely exciting. In terms of teams that are embarking on their journey or have already hit a certain maturity point with an SRE journey, we're excited to partner with them and showcase what they've done that's amazing, but more importantly, if there are gaps, we're there to help them. So I think that's something that teams can look forward to.

In terms of teams that are embarking on their journey or have already hit a certain maturity point with an SRE journey, we're excited to partner with them and showcase what they've done that's amazing, but more importantly, if there are gaps, we're there to help them.

On the product side, we've got some really exciting stuff, a lot of cool integrations. Platform play is really critical to us in our success. You're going to see a large amount of deep integrations coming into our platform, supporting both the mid-market and enterprise. The thing that we've done early on as a part of our strategy is to never say, "We're just going to focus on this sliver of enterprise." It's always been a platform that plays well with others. There's always a better together story and people can just plug in and take advantage of that. Lots of really cool integrations coming across the entire DevOps stack for observability, alerting and monitoring, change management, orchestration and capacity. 

There's new workflow automation collaboration systems like deeper integrations with Slack and Microsoft Teams that we're going to be launching in that space. And then of course, the other thing that I think customers can be really excited about and look forward to is that it's not just about reliability. Reliability is not something that happens in isolation. It happens in conjunction with things like security, CI/CD, and DevOps. We have a bunch of products that we're launching, a series of thought leadership pieces that we're doing that really builds that full end-to-end picture between when you write a single line of code all the way to testing it, deploying it, breaking it, and then learning from that breakage and creating that feedback loop. Blameless will become the central nervous system that connects everything together and helps orchestrates.

We launched this new SLO product and we’re getting a tremendous amount of interest there. And the one piece of advice that I would give to anybody who's listening: if you're on a journey to start SRE, and don't know where to begin, reach out to us. We'll start with SLOs. That's really the most basic baby step that you can take with the existing tooling that you have in place, without any kind of disruption happening to your culture or teams. It's a way that we can level everyone up.

If you're on a journey to start SRE, and don't know where to begin, reach out to us. We'll start with SLOs. That's really the most basic baby step that you can take with the existing tooling that you have in place, without any kind of disruption happening to your culture or teams. It's a way that we can level everyone up.

If you're interested in seeing how Blameless can help your teams adapt a blameless culture and SRE best practices, try us out now for free.

Resources
Book a blameless demo
To view the calendar in full page view, click here.