Published on the 01/03/2017 | Written by Donovan Jackson
There’s a broader lesson in the downing of Amazon Web Services’ S3 service in North America and that is simply ‘don’t panic’…
That’s according to Soltius cloud architect Peter Joseph, who told iStart that the failure being experienced, reportedly ‘human error’ aside (aren’t they all?), is well within the realms of the expected. “This is not surprising and it is definitely not like an asteroid has flown into the earth. Amazon’s own CTO has pointed out that ‘everything fails all the time’,” he said.
Indeed, thinking back to an age before the cloud, computers could be relied upon to break, generally at the most inopportune times (some say printers are designed to do that). In those not-so-distant times gone by, the accounting (or any other) system being down would be annoying, but no big deal.
What has changed is expectations of ‘always on’ computing which is generally – but not always – delivered by cloud services. Things have gotten a lot more reliable because professionally-hosted services are what they claim to be: more resilient and reliable than a server under a desk or in a cupboard.
“We have to look at this incident in perspective. It isn’t a ‘single provider which has gone down’. It is a single instance of a cloud services provider and I don’t think that is really being appreciated yet,” continued Joseph.
He pointed out that when dealing with AWS, it is a single company, but it is not a single cloud. It has 16 regions which run independently from one another, from an architectural point of view. Those companies which have a requirement for high availability should have architected their services to have failover from one region to another.
But those which don’t have that requirement can, to put it bluntly, cope with an outage of several hours. “AWS builds its services so they don’t all go down at once. Any experienced cloud architect knows that failures are inevitable and will build those services which require high availability across multiple regions of their service provider’s cloud.”
There is no need to question the validity of the cloud as a service capable of supporting business requirements, though this is the inevitable byproduct of a service provider failure, continued Joseph. “That’s especially likely from consumers, who are lulled into a false sense of reliability and tend to get very upset if their Instagram or Facebook is down,” he noted.
For many businesses, however, there will be a level of disruption – but again, he posed the question of whether those business owners have any idea of the availability guaranteed by their service provider. “There are a great many applications, particularly the ‘freemium’-type services, which a lot of businesses slowly but surely build a dependence on. But is there any promise of ‘100 percent uptime’ with those services?”
Since nothing in the IT industry can ever be guaranteed to be up 100 percent of the time, Joseph doubts it. Simple arithmetic warns users that even with a 99.9% uptime commitment they should expect up to 9 hours of unanticipated downtime each year.
Which is not to say that plenty of businesses probably are taking some pain as a consequence of the AWS problem. “The ease of getting into the cloud and the fact that it does work pretty well most of the time means an expectation is created. It is simple for just about anyone to access and build a technology stack in the cloud and expect it to work. The day it goes down will result in ‘it has never gone down before’ – but the truth is that previous resilience is no guarantee for the future.”
Joseph said this is yet another cautionary tale – there have been others before and there will be more in future – to know the expected reliability from any service depended upon. “Do those services make any guarantees? It’s a good idea to know. And it is a good idea to architect your services in line with business requirements, rather than in line with what’s easy, convenient and fast.”