A massive problem with the Fastly content delivery network resulted in the failure of a number of large websites. Those affected include giants like Twitch and Reddit, both of whom were completely offline for over an hour. Now it is clear why it was.
Like Fastly’s technical director, Nick Rockwell BBC confirmed, a single customer triggered the massive outage of CDN services last Tuesday – albeit through no fault of their own. According to this, he had already carried out a software update provided by Fastly in mid-May. However, this only became problematic when he had now changed some settings in the software configuration. This triggered a bug that Fastly overlooked in the software and triggered a chain reaction in the company’s CDN network. That one customer would have triggered around 85 percent of all error messages, reports Rockwell.
Fastly now wants to improve the quality assurance process in order to prevent an error-prone software version from being played out to customers again. In order to save Fastly’s honor, however, it can be said that the company determined the cause of the problem 40 minutes after the malfunction became known and a further 49 minutes later corrected 95 percent of the problems.
Fastly outage: that happened
“We know that users could run into error messages when they try to access Twitch.” Twitch support on Twitter disguised a problem of fundamental proportions.
🔎 We are aware that users may be experiencing errors accessing Twitch at this time. Our team is currently investigating to fix this issue.
– Twitch Support (@TwitchSupport) June 8, 2021
After all, not only individual users could have had problems – the twitch.tv domain was simply completely inaccessible. It was the same for the forums veteran and Wallstreetbets home Reddit. When trying to open Reddit, users worldwide were confronted with an error message.
there’s a huge web outage going on right now. Twitch, Reddit, Amazon, and even The Verge is down. Looks like a key CDN might be down
– Tom Warren (@tomwarren) June 8, 2021
As reported by The Verge’s Tom Warren, Amazon and The Verge themselves were also affected. In contrast to Twitch and Reddit, the problem with Amazon only affected individual service parts and with The Verge initially only the integrated media. The magazine looked quite text-heavy at times. That clearly looked like the failure of central parts of the CDN service used.
After the error messages on Twitch and Reddit were quite clear, there was no need for speculation to look for the problem on Fastly. Indeed had their status information confirms that a number of server locations were down.
In the meantime, all locations in Europe were set to “degraded performance”, which in the correct translation means “performance impaired”, but was practically equivalent to a total failure.
On the Down detector Similar disruption patterns were found in many other large services. There were also problems with Spotify, Twitter, Vimeo and Github in the course of the Fastly failure.
The failure could have drawn much wider circles, because the US provider primarily relies on major customers. For nearly 330 major websites, including Shopify, Buzzfeed, Slack, Business Insider, Kayak, and the New York Times, the company is essential.
In Great Britain, citizens could not take advantage of important administrative services such as applying for ID cards, tax certificates or driving licenses during the outage. In the meantime, other online magazines have confirmed that they have been affected. That’s how CNN, the Guardian, and the Financial Times got it. The Verge got creative after its brief total failure and ran the magazine for a while to Google Docs relocated.
Fastly is growing massively during the corona crisis
Fastly grew by 40 percent during the corona crisis and now provides its services to more than 2,000 customers worldwide. It is unclear whether and to what extent the problems can be partly attributed to the strong growth.
The service provider had to realize last year that Fastly’s strong corporate focus can potentially cause problems. Because the largest customer to date, the social network Tiktok, had left Fastly as a service provider in the wake of the Trump ban.
Conversely, with today’s glitch, it could dawn on corporations that the idea of running all major websites through the same service provider might not be the best.