Published on the 14/12/2016 | Written by Anthony Caruana
The Australian Tax Office outage is just one in a litany of IT failures that dog government agencies across the country…
This week’s ATO disaster is being blamed on a hardware failure but an expert we spoke to is less certain. And he also believes that any talk of there being no data loss is premature.
IT probity adviser Darryl Carlton from The Probity Partnership said “There’s no way they could know that.”
There was plenty more, too. “You can’t trust what they’re saying. It’s like the census. Their immediate reaction was ‘trust us to protect the data’. Well, we can’t trust them to run the systems.”
Carlton told us he believes the explanation of a hardware error is unlikely to be accurate. He said it’s more likely a software or software controller error. It’s his understanding that the HPE storage array in question has been operating for over a year.
One of the questions not being asked is why the ATO’s disaster recovery processes weren’t able to cope with this failure. In Carlton’s view, this kind of scenario should have not only been foreseen but planned for with appropriate business continuity procedures ready. Basic questions such as understanding the system’s limits were not asked, in his view.
“It’s pathetic. What are we paying them for?” he asked; in his view, the ATO hasn’t done it’s due diligence, instead deferring to systems suppliers. “They’ve relied on ‘Trust me, I’m the vendor’.”
Carlton believes the recent failures at the Australian Bureau of Statistics with the census and this week with the ATO are the result of many years of poor processes, the culmination of which he compared to the explosion of the space shuttle Challenger. In that case, blame was sheeted home to failed O-rings. The reason these weren’t considered a risk by the launch team was that they had never failed before. Carlton said the same sort of thinking had permeated government IT departments over many years: they simply failed to accept the level of some risks as those problems had not happened before.
This is a result of many years of lowering the bar of expectation, said Carlton. Over the years, projects such as the Myki ticketing system in Victoria, health systems in Queensland and others have created a culture where project completion, rather than project value, has been set as the bar for success.
Decisions that pushed the boundaries of risk in the past worked out, so there’s a continual pushing of those boundaries.
“Over the last 20 years, what’s occurred is projects have been completed in circumstances that are not optimal. There’s been a normalisation of deviance. It’s worked OK when we’ve used these people. It’s worked OK when we use these providers. So, we’ll do that again next time.”
There is a way forward, Carlton said. Projects need to not be completely divested to business users who may have the confidence and responsibility to run major projects but lack the experience and understanding of the complexity involved in these projects. There also should be greater openness to not wanting to run everything in ‘owned’ data centres, instead considering the likes of Amazon or Azure which are better able to scale and meet changing demands.
Most importantly, said Carlton, there needs to be a cultural change in the way risk is considered and how advice given by experts is received and handled.