Cloud wake-up call: Amazon Web Services outage illustrates the pitfalls of online infrastructure

By Todd Bishop

Bigstock Photo / klevo

Amazon Web Services is one of the greatest successes of modern business and technology, leveraging its first-mover advantage in the public cloud to empower companies with capabilities that they couldn’t build on their own, and creating a lucrative business for itself in the process.

It sure didn’t feel so dreamy on Tuesday.

Outages are nothing new, and this wasn’t the worst we’ve seen, but the problem experienced by the widely used AWS US-EAST-1 Region was remarkable for its widespread impact, illustrating the extraordinary reach of the cloud.

This wasn’t just about websites going down. Day traders couldn’t trade. Gamers couldn’t game. Adele couldn’t sell tickets to her upcoming tour, for goodness’ sake.

The fallout was apparent everywhere you looked, from McDonald’s kiosks to Tinder hookups to NPR podcasts. Seattle startup Intelus had the bad luck of launching its company on Tuesday morning, with a website hosted on AWS.

Amazon itself was far from immune from the challenges. The company’s employees were unable to use its Chime communication app on their computers for several hours. Amazon Music was unavailable to many users. COVID-19 test results weren’t accessible for hours through the company’s mail-in testing service.

Customers of the company’s Ring subsidiary were cut off from their cameras.

It got worse. In the company’s core e-commerce business, product pages didn’t load, and orders didn’t go through. Customers couldn’t order groceries. Delivery drivers sang karaoke and ultimately went home for the day after the outage severed their ties to the app that coordinates their deliveries.

This couldn’t have come at a worse time for Amazon, during a holiday season that was already challenged by supply chain bottlenecks.

All of this leads to an obvious question: Has the world become too dependent on Amazon’s cloud?

“I think so,” said Corey Quinn, the chief cloud economist at The Duckbill Group, when I asked that question in a Twitter direct message on Tuesday evening. He pointed out that it could have been a lot worse: “A full outage of that region (not the partial one we saw today) means a massive economic event.”

Quinn elaborated on his thoughts in a Twitter thread that’s worth reading in full.

To be explicit, I don’t think AWS has done anything wrong here. This is the natural end result of their success at massive scale.— Corey Quinn (@QuinnyPig) December 8, 2021

What can be done? What happened to redundancy?

In a post about the AWS outage, Forrester senior analyst Brent Ellis laid out a strategy for companies to minimize their vulnerability. Part of his advice: “Diversify your risk by building applications and services that can be shifted between multiple cloud providers or private infrastructure automatically as a service fails.”

In terms of a broader solution to the world’s reliance on AWS, there may be no clear answers at this point, but as Quinn notes, these are important questions to ask.

Ultimately, the solution could come from Amazon itself, and with the former AWS chief Andy Jassy at the head of the company now, Amazon should better-positioned than ever to address this challenge.

AWS problems impact Amazon, Disney, Smartsheet, Canva and other online services

By Todd Bishop

GeekWire File Photo

If you’re having problems buying items or logging in to Amazon.com, you’re not alone, and thanks to the widespread use of Amazon Web Services, the problem isn’t limited to Amazon.

Disney, League of Legends, Smartsheet, Canva and other online services are down or reporting problems for some customers.

The official AWS Service Health Dashboard reports increased error rates for services such as Elastic Cloud Compute (EC2), Amazon Connect, and the DynamoDB database service, all out of its Northern Virginia region.

Amazon Web Services having problemos today. Story on terminal. Any signs of this cascading elsewhere? So far, folks complaining about shopping, trouble with Amazon music, merchants having trouble managing Amazon advertising. Any spread beyond Amazon?— I don’t have 10,000 followers (@spencersoper) December 7, 2021

Smartsheet, the Bellevue-based work management company, reports as of 9:19 a.m. Pacific time, “Our AWS partner has communicated that they have identified the root cause and are actively working on a recovery.”

It’s not just tech companies that are impacted. The Baltimore Sun, for one, says it’s unable to make updates to its site.

On Amazon, the problem has manifested in a variety of ways, making it difficult for some users to purchase items, call up their order history, and even log into Amazon’s special website for COVID-19 test results.

Update, 11:45 a.m.: Bloomberg News reports that Amazon’s delivery operations are also being impacted, creating an outage in an app used to communicate with drivers, as well as the Amazon Flex app used by gig workers who deliver packages. The ripple effect could be significant given the timing during the peak holiday season.

Update, 12:55 p.m.: Here’s the latest, as of 11:26 a.m. and 12:34 p.m. Pacific:

We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. Services impacted include: EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1. 

We continue to experience increased API error rates for multiple AWS Services in the US-EAST-1 Region. The root cause of this issue is an impairment of several network devices. We continue to work toward mitigation, and are actively working on a number of different mitigation and resolution actions. While we have observed some early signs of recovery, we do not have an ETA for full recovery.

Amazon services including Alexa, Ring and Amazon Fresh grocery ordering are also impacted.

Update, 2:30 p.m.: Amazon says it’s making progress. Here’s the latest.

We have executed a mitigation which is showing significant recovery in the US-EAST-1 Region. We are continuing to closely monitor the health of the network devices and we expect to continue to make progress towards full recovery. We still do not have an ETA for full recovery at this time.

Meanwhile, back at the delivery station, Amazon drivers are channeling Bob Marley.

“I don’t wanna wait in vain” — the official lyric of Amazon’s 2021 holiday season.

Update, 3:03 p.m.: Amazon now says, “Many services have already recovered, however we are working towards full recovery across services. Services like SSO, Connect, API Gateway, ECS/Fargate, and EventBridge are still experiencing impact. Engineers are actively working on resolving impact to these services.”

Amazon preps for a ‘multi-robot’ world: RoboRunner cloud service builds on its own warehouse tech

By Todd Bishop

Amazon’s new RoboRunner cloud service can orchestrate the operation of robots from multiple vendors. (Amazon Photo)

The concept of “multi-cloud” has become widely accepted in the tech industry, as companies use different cloud vendors for various types of cloud workloads. Now Amazon is preparing for a similar outcome in robotics.

Amazon is incorporating technology originally developed for its own warehouses into a new cloud service, RoboRunner, that other companies can use to manage and coordinate fleets of robots from multiple vendors.

The company separately announced a new AWS Robotics Startups Accelerator, in partnership with MassRobotics, to help robotics startups incorporate Amazon’s cloud technologies into their products and services.

AWS IoT RoboRunner, unveiled by Amazon Web Services at its re:Invent conference in Las Vegas, lets companies connect robot fleets to the cloud, operate different types of robots as part of the same system, and develop apps that optimize the operations of an automated fleet using real-time data from the warehouse or factory floor.

The AWS Robotics team, which makes cloud services for use with robots, has always worked closely with Amazon’s internal robotics teams. However, the direct lineage of RoboRunner is unusual, said Eric Anderson, general manager of AWS Robotics and Autonomous Services, in an interview with GeekWire.

[embedded content]

“RoboRunner is unique in that it the technology actually started as a program inside of Amazon to help our robotic fleets scale to the levels that we’re operating at today, and to allow them to incorporate more diversity in the types of equipment that we could actually deploy as a part of our fulfillment and logistics operations,” Anderson said.

It’s a follow-up to the company’s AWS RoboMaker robot simulation service, launched in 2018 to help roboticists and software engineers build cloud applications and simulations. Amazon is targeting RoboRunner to a different customer base: larger companies that are buying and operating robotic systems at a larger scale.

With the new service, Amazon is looking to get ahead of a long-term trend.

For now, most companies are using a single robot platform. However, this will change as commercial and industrial robots proliferate in the years ahead, wrote Gartner analyst Dwight Klappich in a July report that addressed the trend.

“As companies expand their use of robotics, most will eventually have heterogeneous fleets of robots from different vendors performing a wide array of tasks,” he wrote. “Integrating with and coordinating the work of a varied fleet of robots will require standardized orchestration software that can easily integrate to a variety of specific robot platforms.”

Amazon is far from alone in pursuing this market. Examples listed by Gartner were Accelogix; Formant; GreyOrange; MacGregor Partners; Rapyuta Robotics; Ready Robotics; Rocos; and SVT Robotics.

Amazon’s new cloud chief Adam Selipsky hints at different directions for AWS as re:Invent begins

By Todd Bishop

Adam Selipsky, Amazon Web Services CEO. (Amazon Photo)

Amazon Web Services this week is holding its first re:Invent conference without Andy Jassy at the helm of the tech giant’s cloud unit. The traditional keynote address Tuesday morning will be delivered by Adam Selipsky, who returned to AWS this year as its CEO after Jassy was named to succeed Jeff Bezos as Amazon CEO.

Seplisky, who was previously Tableau Software CEO, gave a preview of two new directions for AWS in interviews prior to this week’s event.

First, he said AWS plans to offer more high-level “horizontal” services that can be used across a variety of industries, expanding its business further beyond the basic AWS building blocks of EC2 cloud computing and S3 cloud storage. He cited the AWS Cloud Contact Center service for call centers as an example.

“More and more, customers are asking us to provide them with higher-level abstractions on top of AWS services,” he told enterprise technology publication SiliconANGLE in an interview.

Second, he said AWS plans to build more services tailored to specific industries. In an interview with Bloomberg Television, Selipsky said AWS is starting to build specific offerings for industries such as financial services, telecommunications, automotive and healthcare offerings.

Microsoft and Google, Amazon’s main cloud rivals, have been more aggressive than Amazon in rolling out cloud services and platforms tailored to specific industries.

“The world around us is changing so much that we’re going to have to be different,” Selipsky said in the Bloomberg interview. “It doesn’t matter what we did yesterday.”

He elaborated on that theme in an interview with CRN.

“No matter how successful we have been in the past—just given the rate of change in the industry, how quickly our market segment is evolving and just the rate of growth of AWS—there are always going to be big opportunities … to double down even further on successes, to find new areas in which our customers need us to innovate and to improve existing things we do,” he told CRN. “So I’ve really been trying to find and focus on the most important of those.”

Amazon Web Services remains the public cloud leader with 33% market share in cloud infrastructure, platform and hosted private cloud services, according to Synergy Research Group. Microsoft Azure is second with 20%, and Google Cloud is third with 10%, the firm estimates.

Overall enterprise spending on cloud services in the third quarter topped $45 billion, an increase of 37% from the same quarter a year earlier.

AWS re:Invent takes place this week in Las Vegas. Sessions are also streamed online.

F5 refreshes brand by dropping ‘Networks’ from name, marking ‘huge change’ in its business

By Todd Bishop

F5’s Customer Engagement Center. (F5 Photo)

F5 Networks will be just F5 from now on.

The Seattle-based company, which originally made its mark in networking technology, said this week that it’s dropping the “Networks” from its name, reflecting its expanded focus on delivering and securing applications.

Mika Yamamoto, F5 chief marketing and customer experience officer. (F5 Photo)

“Now, dropping a single word from our name might not seem like such a big deal. But to us, this change is huge,” writes Mika Yamamoto, F5 chief marketing and customer experience officer, in a post announcing the streamlined name. “By breaking from the confines of ‘networks,’ we’re freeing ourselves to move boldly into a future constrained only by the limits of our imagination.”

As part of its expansion beyond its traditional hardware business, F5 has made a series of major acquisitions under CEO François Locoh-Donou, who joined the company in 2017. F5 has spent more than $2 billion to absorb a variety of cloud and security software companies.

F5 announced a $670 million deal for Nginx, the company behind the widely-used web and application server technology, in March 2019; and completed its $1 billion purchase of Shape Security in January 2020. It acquired cloud computing company Volterra for $500 million in January.

More recently, F5 bought Boston-based cloud monitoring company Threat Stack for $68 million.

Software revenue grew by 37% to $500 million in its recently completed fiscal year, out of $2.5 billion overall.

F5 was founded in 1996. Headquartered in a 2-year-old downtown Seattle skyscraper, the company has a total of 6,461 employees globally, including more than 1,400 in the Seattle area.

Previously: Applications everywhere: F5 Networks’ CEO on the surprising evolution of apps and the cloud

Return of the JEDI: Pentagon’s multicloud sequel puts Microsoft and Amazon in leading roles

By Todd Bishop

Andy Jassy, the new Amazon CEO, was critical of the Pentagon’s previous decision to award the $10 billion JEDI contract to Microsoft in his prior role as CEO of Amazon Web Services. (GeekWire Photo / Dan DeLong)

The U.S. Department of Defense is soliciting bids from multiple companies to upgrade its technology capabilities with the benefit of artificial intelligence, machine learning, data analysis and other hallmarks of modern cloud platforms.

The Pentagon’s new Joint Warfighting Cloud Capability (JWCC) program is a replacement for its previous $10 billion Joint Enterprise Defense Infrastructure (JEDI) program, which was awarded to Microsoft, to the outrage of Amazon.

That lead to disputes, litigation, allegations of presidential corruption, and ultimately the cancelation of the whole thing.

Andy Jassy, the new Amazon CEO, was outspoken on the issue as the leader of Amazon’s cloud business at the time.

In a departure from the prior single-vendor contract, the Pentagon this time is spreading its cloud around. Amazon, Google, Microsoft and Oracle have been solicited to submit bids for the work, which will go to multiple companies. In addition, the government is not putting an overall value on the contracts.

“Preserving national defense requires immediate action,” the DoD says in a statement of requirements. “Therefore, the Secretary of Defense has initiated a series of enterprise initiatives that are designed to bring greater urgency, focus, and unity of effort within the Department to address China as our number one pacing challenge. These initiatives will give our Warfighters the operational advantage to prevail in peace and to win in conflict.”

Here’s an excerpt from the notice.

The Department of Defense (DoD) has a requirement to purchase commercial cloud service offerings and support services. See attached “Required Capabilities” description for additional detail.  The anticipated result will be to award multiple Indefinite-Delivery, Indefinite-Quantity (IDIQ) contracts under FAR Part 16. However, the Department is also seeking information from potential additional sources to better inform its acquisition strategy.The Government anticipates awarding two IDIQ contracts — one to Amazon Web Services, Inc. (AWS) and one to Microsoft Corporation (Microsoft) — but intends to award to all Cloud Service Providers (CSPs) that demonstrate the capability to meet DoD’s requirements. Each IDIQ contract, under which task orders will be placed, is intended to be for a period of performance of one 36-month base period with two 12-month option periods.  The Department is still evaluating the contract ceiling for this procurement, but anticipates that a multi-billion dollar ceiling will be required. The contract ordering ceiling will be included in any directed solicitations issued to vendors.

We’ve contacted both Amazon and Microsoft for comment on the new bidding process.

As reported by CNBC, Google Cloud plans to bid for a piece of the work despite employee pushback over its work on government and military contracts.

Read the full statement of required capabilities below.