Cloud computing - Benefits

Fault Tolerance

  1. We man the pagers for you
  2. Automated failure recovery

Low maintenance

  1. We manage updates on every level for you (bare metal -> software patches)
  2. Focus on what you do best

Durability

  1. Built-in replication
  2. Distributed geographically

Accessibility

  1. Always on, always available (as long as you have an internet connection)
  2. Local development environments

High availability

The ability of a system to be accessible and usable by users when they need it.

A conscious effort to avoid the obvious sources of downtime

  1. Also called “up time”.
  2. The ability of a system to remain operational to users during planned or unplanned outages.
  3. No major service provider has 100% availability

Planned outages

  1. Operating system security patches
  2. Application updates
  3. Hardware replacement
  4. Migrating to a new hosting provider

Mitigating them:

  1. Gradual deployment strategy
  2. Testing and monitoring of deployment
  3. Easy rollback plan
  4. Small deployments
  5. Frequent deployments
  6. Automation

Unplanned outages

  1. Hardware failure
  2. Network disruptions
  3. Power outages
  4. Natural disasters
  5. Cyber attacks
  6. Software bugs
  7. Poor scaling/architectural design

Mitigating them:

  1. Every single core component has redundancy
  2. Availability
    1. Availability sets
    2. Availability zones
    3. Cross region load balancing
  3. Constant health monitoring
  4. Automation
  5. Strong security practices
  6. Be geographically distributed
  7. Have a disaster recovery plan
  8. Test that disaster recovery plan / fire drills
  9. Load testing

Scalability

The ability of a system to accommodate increasing (varying) demand by adding or removing resources as needed.

Allows a system to adapt to changing usage patterns and handle increased traffic without requiring changes to the application code or system design.

Having a scalable system allows for a system to be perfectly sized. This optimizes the cost by reducing wasted computer resources.

Vertical scaling

  1. Also called “scaling up” or “scaling down”
  2. Adding more resources to a single server
  3. Increase the amount of memory, the number of CPUs
  4. There is an upper limit to this - it is the largest server available
  5. Does not improve availability - in a single server system

Horizontal scaling

  1. Also called “scaling out” or “scaling in”
  2. Add more servers to the system
  3. No limits to scaling
  4. Additional complexities for load balancing
  5. Can improve availability
  6. Allows the system to grow and shrink on demand

Elasticity

The ability of a system to quickly and easily scale up or down the amount of resouces that a system uses in response to changing demand.

  1. Has to involve some sort of automation
  2. Often called “autoscaling” in cloud computing
  3. The system monitors some metric (e.g. CPU utilization) to determine how busy a system is
  4. Add resources when it exceeds a limit for being busy
  5. Removes resources when it falls below a limit for not being busy
  6. More efficient and cost-effective use of resources
  7. Minimizing computing “waste” - resouces paid for and not used
  8. Self-hosted systems tend to have a large percentage of “over-provisioned” resources for anticipated future growth
  9. Have the potential to have a maximum capacity higher than you could afford if you had a static provisioning of resources

Reliability

  1. How dependable a system is
  2. The ability of a system to perform its intended function without interruption and with a high degree of accuracy
  3. You have to trust that your cloud provider is doing everything it can to make its platform reliable
  4. This includes transparency during service issues
  5. How is it implemented?
    1. Auto-scaling
    2. Multiple regions
    3. Data backup ad replication
    4. Health checks and self healing

Availability vs Reliability

  1. A system can be highly available to users - in that, it responds instantly to every request. However, don’t look behind the curtain. The system itself might be highly unreliable. e.g. a calculator that responds all the time, but gives wrong answers or an app that loses your data sometimes randomly.
  2. Availability is an appearance to the end users
  3. Reliability is the underlying truth

Predictability

  1. The ability to forecase and control the performance and behavior of a system
  2. Includes the ability to predict future costs
  3. Why?
    1. Gives us the confidence that the system will continue to perform at the expected level in the future
    2. We will not get a crazy bill unexpectedly
  4. How?
    1. Auto scaling
    2. Load balancing
    3. Different instance types, sizes, pricing tiers
    4. Cost management tools
    5. APIs for billing
    6. Pricing calculators

Security

Security is a full-time job

  1. Cloud providers are obviously massive targets for hackers, and so they rightly spend a lot of time, money and effort on platform security
  2. Cloud providers go through security audits and compliance certifications
  3. They provide customers the tools they need to enable and monitor security with their own applications/data
  4. Why?
    1. Fundamental challenge in IT
    2. We want confidence that our cloud provider cannot easily be defeated by hackers and those with malicious intent
  5. How?
    1. Industry standard compliance certifications
    2. Always-on DDoS
    3. Microsoft Security Response Center (MSRC)
    4. Azure Policy and Blueprint
    5. Role based access control (RBAC)
    6. Azure Active Directory
    7. Always up-to-date platform services
    8. Update management
    9. Encryption by default
    10. Dozens of security services like firewall

Governance

  1. How your organization chooses to do business
  2. Could be executive governance, IT governance, business governance
  3. The process of defining, implementing, and monitoring a framework of policies that guides an organization’s cloud operations
  4. Why?
    1. The company wants to ensure it’s policies are followed in the cloud
    2. Includes basic auditing and reporting, as well as enforcement
    3. The company wants to be compliant with industry standards such as HIPPA or PCC or GDPR
  5. How?
    1. Azure Policy and Blueprint
    2. Management groups
    3. Custom roles
    4. Soft delete
    5. Guides and best practices such as Cloud Adoption Framework

Manageability

  1. Management of the cloud
    1. Templates
    2. Automation
    3. Scaling
    4. Monitoring and alerts
    5. Self-healing
  2. Management in the cloud
    1. Web portal
    2. Command line interface and scripts
    3. APIs
    4. PowerShell
  3. Why?
    1. How easy it is to work with your applications in the cloud impacts cost, performance, security and other priorities
    2. Different cloud vendors are going to be easier or harder to work with
  4. How?
    1. Azure Portal, CLI, PowerShell, Cloud Shell, REST APIs, and other programmatic methods
    2. Consolidated monitoring and alerting system
    3. Ability to use ARM templates, Bicep, Terraform, etc.
    4. Autoscaling of most types of compute resources

Links to this note