AWS Outage Dec 7 2021 – Why this was really Bad and good for some

As I am writing this post, AWS on the east coast is effectively down. What makes this outage so bad is the supposedly non-region specific services are also not functioning. IAM does not have full functionality so I can’t mange security and its not an easy service to try and access via the AWS CLI. I cannot login to my production account with my IAM user and I will not login as root especially during an outage.  

Why this was really bad. 

Everything that is done via the AWS console is an API call so when you see API error on each page of the console in us-east-1, you know we have a significant issue. A scary notice is when you see unknown region when you are trying to access us-east-1. I have AWS friends that only use the console so they were totally dead in the water. So I tried doing AWS CLI calls to get around the console errors. For the first 3-4 hours those were failing also so now I am dead in the water.  

Once I could make api calls I could do some monitoring and checking but if you have worked with the CLI – you can get too much data back or its structure in nested lists so syntax to query the list can be tricky and cumbsersome. Something that takes a minute to do in the console can take 30 mins of work to get data via the cli and then execute what you want.  In my case I have used CLI before but never for Route 53 in order to get why one web server health check was not reporting and error.  Turns it it was a configuration error on my part. 

 

Why this was good.

 

  • AWS needs to apply the Well Architected Framework to us-east-1 
  • “AWS Global Services cannot be dependent on us-east-1
  • People and organizations found holes in the configuration and DR plans. 
  • People now know how many internet companies rely on AWS and its eco-system,
  • Found out my health check in one instance was mis configured
  • We discovered that AWS us-east-1 is a single point of failure
  • I spent yesterday architecting and pricing  cold and hot DR to another region 
  • I am working on library of AWS CLI scripts that I would use in event the console is down by the AWS CLI is functioning. These incude:
    • EC2 monitoring and management (Starting, Stopping Instances)
    • Copying Ami’ across regions 
    • Creating load balancers – you incur charges for each load balancers so not paying for hot standby (especially when client is ok) 
    • Query health check status 

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Translate »