Intercom's mission is to make the online business personalized. But personalizing a product is not possible when it does not work as it should. Efficiency is critical to the success of our business, and not only because our customers pay us, but also because we use our product ourselves. If our service does not work, we literally feel the pain of our customers.
Uptime depends on many factors, such as software architecture and the quality of daily work. However, quite often it all boils down to the fact that a person who is always in touch answers calls from PagerDuty. Such technical support can be a powerful customer-oriented tool that combines the assistance of engineers with what customers receive when they purchase your product. It also offers an excellent opportunity for learning and growth, because in the end, failures and mistakes can be a good field for developing skills and understanding the complex mechanisms of work.
Staying “always in touch” outside of working hours is detrimental to your life.
But at the same time, the state of "always in touch" can have a detrimental effect on your life. You should be prepared to quickly and competently respond to an alert that something is broken. Even if you are not called by a pager at this particular moment, the state of “always in touch” instills a feeling of anxiety, I myself know this from personal experience. Especially because of this, the quality of sleep is deteriorating. A regular stay in the access area at any time of the day can lead to burnout, apathy, or in general to the desire to never see a computer again.
Intercom Status History at Intercom
In the very first days of Intercom, our technical director, Chiaran, alone was a whole team of round-the-clock technical support, both in the office and outside. As Intercom grew, a task force was created to help Ciaran. Soon after, new development teams began to create many new features and services, and they already took upon themselves all the technical support responsibilities.
At any moment there were too many people “in touch”.
At that time, this approach seemed to be taken for granted, as it was an easy way to scale the technical support team at any time, it met our values and entertained our sense of ownership. As a result, without any plans, we got four or five teams that regularly got in touch with customers during their after hours. The rest of the development teams did not have many difficult moments that could throw an error, so they were rarely, if ever, called.
We realized that we were in a situation where we have the mechanics of technical support, which cannot be proud of, and a number of critical problems that we wanted to eliminate, such as:
- Too many people were ready to accept the challenge at any given time. Our infrastructure was not so large that it required a minimum of five development engineers who would work without a normal weekend.
- The quality of our alarms and call procedures was not coordinated between the teams; we used special processes to review new and existing alerts about problems. The instructions in the runbook (which should be followed when receiving a notification about a problem) were mostly striking in their absence.
- Depending on the team in which the engineers worked, they had conflicting expectations. For example, only the very first technical support team had any compensation for duty shifts and broken weekends.
- It turned out that there is a general level of tolerance for unnecessary calls at inopportune times.
- Finally, this type of work is not suitable for everyone. Life circumstances sometimes showed that duty shifts affect people not in the best way.
Search for the correct state of “always in touch”
We decided to create a new virtual team that will do the technical support work for each team when it has a non-working time. The team will consist of volunteers, not draftees from any team in the organization. Engineers in a virtual team changed about every six months, spending weeks “in touch”. Fortunately, we had no problem finding enough volunteers to assemble a virtual team.
As a result, our support team was reduced from 30 people to just 6 or 7.
This team then agreed and determined what the problem alerts and descriptions should look like in the runbook, and described the process of forwarding alerts to a new support team. They identified all alerts in the code using the Terraform module, and began using expert judgment for each change. We introduced a level of compensation for the weekly shift, which was quite suitable for those on duty. We also created an escalated second level team, which consisted only of managers. This team should be the only escalation point for technical support engineers.
We had several months of hard work, during which we started this process, as a result, not 30 engineers as before, but only 6 or 7 remained in touch. During working hours, the teams independently deal with problems with their functions or services, on this time usually accounts for the largest number of breakdowns, but the rest of the time volunteers are involved in technical support.
What did we learn
After we launched our virtual technical support team, we expected an influx of new tasks, such as investigating the causes of problems or a general gathering to solve any one problem that caused the failure. However, our development teams took full responsibility for the factors causing the failures, and any subsequent reaction was usually immediate action. We also needed to avoid a situation in which the task of technical consultation would be returned back to the team from which it had arrived so as not to force the engineers to get in touch after hours.
Out-of-office calls decreased to less than 10 per month.
Formally, our escalation process was rarely used. The more common opinion was that the engineer, who is currently online, helps the engineer informally, especially for our guys from the office in San Francisco. Many problems were fixed or their number was reduced through teamwork and solving them on the fly.
The engineers at our San Francisco office joined the team as a whole and went beyond the usual technical support. We were faced with a question about certain overhead costs, but extending our support team membership to several offices played into our hands, as it turned out to be a good way to build relationships, strengthen them and learn more about the technology stack that we all work with.
In our teams, the work of Intercom developers has become more consistent, and we can confidently talk about the advantages of a system engineer position on our Careers website, stating that there is no need to always be in touch if you do not want it yourself.
Along with the fundamental work to stabilize and scale our data warehouses, constant attention to solving problems has led to the fact that the number of calls during off hours has been reduced to less than 10 per month. We are very proud of this number.
We continue to work on the maintenance and improvement of our technical support team, and as Intercom grows, we may have to reconsider our decisions, because what works today will not necessarily work the next time our number of employees doubles. Nevertheless, this experience was extremely positive for our organization; it significantly improved the quality of life of our development engineers, the quality of our responses to challenges, and, above all, the experience of our customers.