Description
Book SynopsisGain a foundational understanding of SRE and learn its basic concepts and architectural best practices for deploying Azure IaaS, PaaS, and microservices-based resilient architectures.
The book starts with the base concepts of SRE operations and developer needs, followed by definitions and acronyms of Service Level Agreements in real-world scenarios. Moving forward, you will learn how to build resilient IaaS solutions, PaaS solutions, and microservices architecture in Azure. Here you will go through Azure reference architecture for high-available storage, networking and virtual machine computing, describing Availability Sets and Zones and Scale Sets as main scenarios. You will explore similar reference architectures for Platform Services such as App Services with Web Apps, and work with data solutions like Azure SQL and Azure Cosmos DB.
Next, you will learn automation to enable SRE with Azure DevOps Pipelines and GitHub Actions. You''ll also gain an unders
Table of Contents
Chapter 1: The foundation of SRE
This chapter lays out the foundation of Site Resiliency Engineering, founded by Google. From the base concepts of how IT Operations and Developers need to collaborate, to how SRE helps organizations in running business-critical workloads without major downtime
Chapter 2: Service Level Management definitions and acronyms and their meaning in a real-life context
This Chapter describes all common Service Level Agreements (SLA) definitions and acronyms, looked at from a real-world scenario to provide a clear understanding
o Some examples, SLA, SLO, MTTF, MTBF, MTTR,…
Chapter 3: Architecting Resilient Infrastructure as a Service (IaaS) Solutions in Azure
SRE is all about providing ultimate uptime of your organization’s workloads, and this chapter will cover that in relation to Azure IaaS Compute solutions. Explaining the Azure reference architecture for high-available storage, networking and Virtual Machine computing, describing Availability Sets and Zones and ScaleSets as main scenarios. It will also touch on preparing for Disaster Recovery with Azure Backup and Azure Site Recovery, helping you to quickly mitigate outages in case of a failure
Chapter 4: Architecting Resilient Platform as a Service (PaaS) Solutions in Azure
Following on the scenario of Virtual Machines, this chapter details similar reference architectures for Platform Services such as App Services with Web Apps, but also touching on data solutions like Azure SQL and Azure Cosmos DB
Chapter 5: Architecting Resilient Serverless and Microservices architectures in Azure
This third chapter in the reference architecture topic describes how to build high-available, business-critical scenarios using Serverless Functions and Azure LogicApps, as well as Microservices scenarios using Azure Container Instance and Azure Kubernetes Service (AKS).
Chapter 6: Automation to enable SRE with Azure DevOps Pipelines / GitHub Actions
Automation is the cornerstone to SRE, allowing businesses to not only deploy new workloads in a easy way, but also relying on SRE to avoid critical outages or, when an outage occurs, relying on automation to mitigate the problem as fast as possible. Sharing several examples from both Azure DevOps Pipelines and GitHub Actions, this chapter provides the reader a lot of real-life examples to reuse in their own environment
Chapter 7: Efficiently handling blameless post-mortems
Post-Mortems are the way to look back at what caused the outage, and describe any lessons learned for the future, helping in avoiding a similar outage in the future, or assist in quickly fixing an identical incident. Blameless is where the focus is on finding the root-cause of the problem, without pinpointing any individual or team as being the victim. This chapter describes how an open culture around post-mortems dramatically helps in optimizing SRE and the overall company culture around managing and running IT systems and application workloads.
Chapter 8: Monitoring as the key to knowledge
Besides the automated deployments, monitoring is the 2nd big technical topic in any SRE scenario. You can’t manage what you don’t know. This chapter provides an overview of Azure Monitor and Log Analytics, which forms the foundation of monitoring Azure and Hybrid-running workloads. Starting from metrics for the different Azure services touched on in earlier chapters, this chapter also covers how to export logs to 3rd party solutions such as Splunk or integrating dashboarding tools like Grafana