Human error cause of Optus outage chaos

Human error has been blamed for the Optus outage that left millions without phone and internet services on Wednesday 8th November 2023.

RMIT University associate professor Mark Gregory said information provided by Optus following the outage showed that it was human error, rather than a hardware failure or cyber attack, that caused the disruption.

In their announcement, Optus said:

“At around 4.05am Wednesday morning, the Optus network received changes to routing information from an international peering network following a routine software upgrade. These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these. This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.”

The impact of what had happened gradually started to make itself felt as customers woke to find themselves disconnected from their daily lives. One customer, ‘Annie’ told her local radio station that she found out about the disruption when her cat’s automatic feeder failed to deliver breakfast. Another customer told the ABC that the incident had left her unable to receive important updates about her father’s cancer treatment. “I’m just waiting for results, and I can’t even get those through,” she said. The impact was felt across the country, with Melbourne’s trains brought to a halt and an estimated 4000,000 Optus small business customers missing out on vital revenue, unable to carry out EFTPOS transactions. Even more seriously, it was reported that many emergency call services were unable to be patched through.

Optus again?

The outage came just a year after the company suffered what was believed to be the biggest data breach in Australian history. The breach was the result of a cyber attack in September 2022, when about 10 million customers had personal data stolen.

After the attack, the telecoms provider revealed that current and former customers’ data was stolen – including names, birthdates, home addresses, phone and email contacts, and passport and driving license numbers. It stressed that payment details and account passwords were not compromised, going public about the breach 24 hours after it noticed suspicious activity on its network. Those whose passport or licence numbers had been taken – roughly 2.8 million people – were deemed to be at a “quite significant” risk of identity theft and fraud, according to the Australian Government. The breach also ignited critical questions about how Australia handles data and privacy.

Although the company had denied that the breach was due to human error, some cybersecurity experts, including IBM’s former Chief Information Security Officer, Kris Lovejoy thought otherwise, telling AFR reporters that few senior executives are taking the necessary steps to prevent data breaches, while putting pressure on software developers and data security teams to speed up the roll out new applications

Why it happened

As services were slowly restored following the 2023 outage, theories about the cause started to circulate. Optus was quick to deny that the problems were due to another cyber attack, with CEO Kelly Bayer Rosmarin stating that the cause was a “technical network issue.”

She also rejected claims from unions that 600 job cuts were partly to blame.

“I don’t think that that’s at all related,” she said.

Cloudflare, which tracks a range of activity on the internet, reported noticing a spike in Border Gateway Protocol (BGP) announcements from the telco coinciding with the time the Optus network went offline, according to The Guardian.

Matt Tett, managing director of network analysis company Enex TestLab, told Guardian Australia that while he was not certain of the cause, Optus appears to have had some failure in routeing at 4am that caused an exponential increase in BGP announcements.

“This morning when I woke up, I just instinctively thought: it’s either a cyber incident or a configuration issue. And nine times out of 10 it’s a configuration issue when you have such a big issue like that.”

In simple terms, according to Brent Hodgeson via social media platform X, “4am… a whole bunch of computers went “wait… we’re lost” and asked for directions all at once. 4am is the time they do maintenance as it’s the quietest point for usage. Ergo “maintenance broke it”.

Quote from Brent Hodgeson via social media platform X, “4am… a whole bunch of computers went “wait… we’re lost”

In the immediate aftermath, Optus chief executive Kelly Bayer Rosmarin had dismissed suggestions the outage was caused by a software update, saying  “It’s highly unlikely, our systems are actually very stable,” in an ABC Radio Sydney interview.

Apparently not. “Optus has not explained what went wrong with the test process that should have occurred before the routing software upgrade occurred,” RMIT University’s Mark Gregory commented.

“Also, there is no explanation as to why there appears to have been a lack of redundancy of the key routers, so that if there was a problem the key routers would swap to the redundant routers, which you would expect to be running the previous iteration of software.”

Communications Minister Michelle Rowland found out about the problem from media reports last Wednesday. She has asked her department for a post-incident review when no doubt more will be revealed.

What next?

Former competition watchdog chairman Allan Fels has said that Optus’ woes could lead to increased market share for Telstra and TPG Telecom, which trails Optus and market leader Telstra in the mobile phone market. “There will be a big loss of customers, that will affect the market’s structure,” Mr Fels told the Australian Financial Review.

Analysts are speculating how much money Telstra and other telco groups could make from picking up disgruntled Optus customers; JP Morgan forecast that Telstra could gain up to $125 million in earnings over the next few years.

Optus, which reported a 14 percent slide in earnings before interest and taxation to $141 million on Thursday for the six months to September, will not pay cash compensation to people disrupted by the outage. It will, however, offer 200 gigabytes of extra data at “normal” speeds over one billing cycle to most mobile customers. It will be fascinating to see how Optus customers respond, but we expect it will be with something other than the company’s tagline “Yes”.



Leave a comment

Translate »