Resilience Engineering — Don’t Be Afraid to Show Your Vulnerable Side!

Grigor Khachatryan
4 min readFeb 10, 2021

Every software developer’s primary goal is to come up with a practical, intuitive, and robust product, a platform or service lots of people can use without any major issue. The problem is that what happens out there with real users is a lot more, well, chaotic than in a control environment developers initially work in. That’s why more and more devs have been using specialized techniques to test out their handiwork and ensure optimal reliability.

Resilience engineering is a practice within Site Reliability Engineering (SRE), closely related to Chaos Engineering. If you’re having trouble wrapping your head around all these terms, don’t worry, we’ll cover each aspect separately and then show you the real magic behind properly applied resilience engineering.

First, here’s a short history lesson.

Where Did SRE Come from?

SRE dates back to almost two full decades ago when a ragtag group of Google devs tried to find a way to improve the reliability of the company’s sites and keep them working smoothly as they grew. Needless to say, these guys were so effective that their techniques and strategies were turned into an IT subset of its own.

--

--