Cloud computing - Benefits
Table of Contents
Fault Tolerance
- We man the pagers for you
- Automated failure recovery
Low maintenance
- We manage updates on every level for you (bare metal -> software patches)
- Focus on what you do best
Durability
- Built-in replication
- Distributed geographically
Accessibility
- Always on, always available (as long as you have an internet connection)
- Local development environments
High availability
The ability of a system to be accessible and usable by users when they need it.
A conscious effort to avoid the obvious sources of downtime
- Also called “up time”.
- The ability of a system to remain operational to users during planned or unplanned outages.
- No major service provider has 100% availability
Planned outages
- Operating system security patches
- Application updates
- Hardware replacement
- Migrating to a new hosting provider
Mitigating them:
- Gradual deployment strategy
- Testing and monitoring of deployment
- Easy rollback plan
- Small deployments
- Frequent deployments
- Automation
Unplanned outages
- Hardware failure
- Network disruptions
- Power outages
- Natural disasters
- Cyber attacks
- Software bugs
- Poor scaling/architectural design
Mitigating them:
- Every single core component has redundancy
- Availability
- Availability sets
- Availability zones
- Cross region load balancing
- Constant health monitoring
- Automation
- Strong security practices
- Be geographically distributed
- Have a disaster recovery plan
- Test that disaster recovery plan / fire drills
- Load testing
Scalability
The ability of a system to accommodate increasing (varying) demand by adding or removing resources as needed.
Allows a system to adapt to changing usage patterns and handle increased traffic without requiring changes to the application code or system design.
Having a scalable system allows for a system to be perfectly sized. This optimizes the cost by reducing wasted computer resources.
Vertical scaling
- Also called “scaling up” or “scaling down”
- Adding more resources to a single server
- Increase the amount of memory, the number of CPUs
- There is an upper limit to this - it is the largest server available
- Does not improve availability - in a single server system
Horizontal scaling
- Also called “scaling out” or “scaling in”
- Add more servers to the system
- No limits to scaling
- Additional complexities for load balancing
- Can improve availability
- Allows the system to grow and shrink on demand
Elasticity
The ability of a system to quickly and easily scale up or down the amount of resouces that a system uses in response to changing demand.
- Has to involve some sort of automation
- Often called “autoscaling” in cloud computing
- The system monitors some metric (e.g. CPU utilization) to determine how busy a system is
- Add resources when it exceeds a limit for being busy
- Removes resources when it falls below a limit for not being busy
- More efficient and cost-effective use of resources
- Minimizing computing “waste” - resouces paid for and not used
- Self-hosted systems tend to have a large percentage of “over-provisioned” resources for anticipated future growth
- Have the potential to have a maximum capacity higher than you could afford if you had a static provisioning of resources
Reliability
- How dependable a system is
- The ability of a system to perform its intended function without interruption and with a high degree of accuracy
- You have to trust that your cloud provider is doing everything it can to make its platform reliable
- This includes transparency during service issues
- How is it implemented?
- Auto-scaling
- Multiple regions
- Data backup ad replication
- Health checks and self healing
Availability vs Reliability
- A system can be highly available to users - in that, it responds instantly to every request. However, don’t look behind the curtain. The system itself might be highly unreliable. e.g. a calculator that responds all the time, but gives wrong answers or an app that loses your data sometimes randomly.
- Availability is an appearance to the end users
- Reliability is the underlying truth
Predictability
- The ability to forecase and control the performance and behavior of a system
- Includes the ability to predict future costs
- Why?
- Gives us the confidence that the system will continue to perform at the expected level in the future
- We will not get a crazy bill unexpectedly
- How?
- Auto scaling
- Load balancing
- Different instance types, sizes, pricing tiers
- Cost management tools
- APIs for billing
- Pricing calculators
Security
Security is a full-time job
- Cloud providers are obviously massive targets for hackers, and so they rightly spend a lot of time, money and effort on platform security
- Cloud providers go through security audits and compliance certifications
- They provide customers the tools they need to enable and monitor security with their own applications/data
- Why?
- Fundamental challenge in IT
- We want confidence that our cloud provider cannot easily be defeated by hackers and those with malicious intent
- How?
- Industry standard compliance certifications
- Always-on DDoS
- Microsoft Security Response Center (MSRC)
- Azure Policy and Blueprint
- Role based access control (RBAC)
- Azure Active Directory
- Always up-to-date platform services
- Update management
- Encryption by default
- Dozens of security services like firewall
Governance
- How your organization chooses to do business
- Could be executive governance, IT governance, business governance
- The process of defining, implementing, and monitoring a framework of policies that guides an organization’s cloud operations
- Why?
- The company wants to ensure it’s policies are followed in the cloud
- Includes basic auditing and reporting, as well as enforcement
- The company wants to be compliant with industry standards such as HIPPA or PCC or GDPR
- How?
- Azure Policy and Blueprint
- Management groups
- Custom roles
- Soft delete
- Guides and best practices such as Cloud Adoption Framework
Manageability
- Management of the cloud
- Templates
- Automation
- Scaling
- Monitoring and alerts
- Self-healing
- Management in the cloud
- Web portal
- Command line interface and scripts
- APIs
- PowerShell
- Why?
- How easy it is to work with your applications in the cloud impacts cost, performance, security and other priorities
- Different cloud vendors are going to be easier or harder to work with
- How?
- Azure Portal, CLI, PowerShell, Cloud Shell, REST APIs, and other programmatic methods
- Consolidated monitoring and alerting system
- Ability to use ARM templates, Bicep, Terraform, etc.
- Autoscaling of most types of compute resources