How Google managed a worst-case scenario

A while ago, there was a widespread outage on Google’s Gmail service for more than an hour. Google’s response to its customers was a good example of how every business should act.

First, they acknowledged that there had been a problem and let their customers know that they understood how much it affected them:

“Gmail’s Web interface had a widespread outage earlier today, lasting about 100 minutes. We know how many people rely on Gmail for personal and professional communications, and we take it very seriously when there’s a problem with the service.”

Next, they apologized for the outage — and did NOT try to minimize its impact:

“Thus, right up front, I’d like to apologize to all of you — today’s outage was a Big Deal, and we’re treating it as such.”

Then, they assured their customers that they had looked into the cause of the outage and were taking action based on what they’d learned:

“We’ve already thoroughly investigated what happened, and we’re currently compiling a list of things we intend to fix or improve as a result of the investigation.”

And they provided some details that demonstrated their ability and commitment to resolve any technical glitch quickly:

“The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously).

“We brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google’s architecture), distributed the traffic across the request routers, and the Gmail Web interface came back online.”

Finally, they explained what they were doing to prevent an outage in the future:

“We’ve turned our full attention to helping ensure this kind of event doesn’t happen again. Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom. Some of the actions are more subtle — for example, we have concluded that request routers don’t have sufficient failure isolation.

“We’ll be hard at work over the next few weeks implementing these and other Gmail reliability improvements — Gmail remains more than 99.9% available to all users, and we’re committed to keeping events like today’s notable for their rarity.”

This is a pretty good template for any business to use. It begins with a quick acknowledgement of the problem and its seriousness. It includes enough information to make the customer feel confident the business understands the problem and can fix it. Then it provides reassuring evidence that it won’t happen again.

Every business makes mistakes. Fair-minded customers know that.

If you respond to problems responsibly, the way Google did in this case, your customers will stick by you. But if you try to bury or minimize your mistakes, you will lose their faith in you. And that is the most valuable asset you have.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.