Description

Book Synopsis

Dr. Vladyslav Ukis is head of R&D for the Siemens Healthineers teamplay digital health platform and reliability lead for all Siemens Healthineers Digital Health products. Previously, as software development lead, he drove Continuous Delivery, SRE, and DevRel transformation, helping this large distributed development organization evolve architecture, deployment, testing, operations, and culture to implement these new processes at scale.



Trade Review

"Many enterprises today face the challenge of establishing modern operations for their SaaS offerings. This book provides a proven step-by-step guide for how this can be done from scratch using Google's SRE methodology. From achieving organizational buy-in to laying down the basic SRE foundations, establishing incident response and implementing a suitable organizational structure--the book contains a wealth of advice for development, operations, and leadership teams!"
--Dr. Peter Schardt, Chief Technology Officer at Siemens Healthcare GmbH

"Establishing SRE Foundations is a great introductory guide for anyone new to understanding and implementing Site Reliability Engineering (SRE) in their organization. Vlad creates a solid platform for anyone wishing to understand the SRE approach to building reliability into software services. As well as practical advice on implementing techniques such as SLIs and SLOs, Vlad goes into detail on how to achieve buy-in for SRE adoption and how to modify your organizational setup, rooted in his own experiences of working in a large organization. Those experiences are sorely lacking elsewhere in SRE literature, and when I'm asked in the future about SRE, I'll be referring people to this excellent book."
--Steve Smith, author of Measuring Continuous Delivery (2020)

"I very much enjoyed reading this book, even in its early forms. Vlad treats the topic of SRE methodically and in great detail; if you have ever been wondering whether or not someone else has come across your particular issue in an SRE implementation, this book can answer that question and probably has an actionable solution as well. Destined to become a constantly referenced handbook by all those involved in SRE change projects."
--Niall Murphy, co-author of Site Reliability Engineering (2016) and The Site Reliability Handbook (2018)

"There are an overwhelming number of blogs, books, podcasts, and ad hoc opinions covering the nitty-gritty of SRE toolchains and technology choices. That being said, SRE initiatives rarely fail for technological reasons--they fail for structural or organizational reasons. In Establishing SRE Foundations, Dr. Ukis has given us all a detailed, accessible, and actionable blueprint for the structures and practices of a successful SRE organization. It is an excellent book and one I would recommend to anyone looking to establish a scaled-out SRE practice in a complex environment."
--Ben Sigelman, co-founder of Lightstep

"Establishing SRE Foundations provides far and away the clearest, most comprehensive, and most actionable roadmap I have seen for driving, scaling, and sustaining SRE in an engineering organization. I cannot recommend it highly enough!"
--Randy Shoup, eBay Chief Architect and former Google Engineering leader

"Establishing SRE Foundations is a comprehensive guide for anyone looking to take their software operations to the next level. If you are a beginner, you will learn why SRE is a great methodology for improving operations, what the challenges of introducing SRE are, how to achieve organizational buyin for SRE, how to lay the foundation for SRE in your teams, and how to drive continuous improvement. If you are an experienced practitioner, you will learn how to set up an error budget policy, enable error budget–based decision-making, and implement a suitable organizational structure. I think the content of the book is spot on and highly recommend it!"
--Vitor dos Reis, Director of Software Engineering at Delivery Hero

"Vlad offers a detailed and comprehensive overview of the transformation to SRE. He covers assessment, organizational structures, technical implementation, communication, and continuation. This book is a clear roadmap for any organization starting or progressing their SRE journey, replete with what to consider, options available, and real-world examples. If you are thinking about starting the SRE journey, have found yourself stalled along the way, or are looking for more ideas to help you continue the journey successfully, then buy this book."
--Doc Norton, Change Catalyst, OnBelay Consulting



Table of Contents

Foreword xxi
Preface xxv
Acknowledgments xxix
About the Author xxxiii

Part I: Foundations 1

Chapter 1: Introduction to SRE 3
1.1 Why SRE? 3
1.2 Alignment Using SRE 13
1.3 Why Does SRE Work? 17
1.4 Summary 19

Chapter 2: The Challenge 21
2.1 Misalignment 22
2.2 Collective Ownership 23
2.3 Ownership Using SRE 25
2.4 The Challenge Statement 38
2.5 Coaching 39
2.6 Summary 41

Chapter 3: SRE Basic Concepts 43
3.1 Service Level Indicators 43
3.2 Service Level Objectives 45
3.3 Error Budgets 47
3.4 Error Budget Policies 53
3.5 SRE Concept Pyramid 55
3.6 Alignment Using the SRE Concept Pyramid 59
3.7 Summary 63

Chapter 4: Assessing the Status Quo 65
4.1 Where Is the Organization? 65
4.2 Where Are the People? 69
4.3 Where Is the Tech? 71
4.4 Where Is the Culture? 74
4.5 Where Is the Process? 79
4.6 SRE Maturity Model 81
4.7 Posing Hypotheses 81
4.8 Summary 86

Part II: Running the Transformation 87

Chapter 5: Achieving Organizational Buy-In 89
5.1 Getting People Behind SRE 89
5.2 SRE Marketing Funnel 92
5.3 SRE Coaches 96
5.4 Top-Down Buy-In 99
5.5 Bottom-Up Buy-In 117
5.6 Lateral Buy-In 122
5.7 Buy-In Staggering 123
5.8 Team Coaching 124
5.9 Traversing the Organization 126
5.10 Organizational Coaching 131
5.11 Summary 133

Chapter 6: Laying Down the Foundations 135
6.1 Introductory Talks by Team 135
6.2 Conveying the Basics 136
6.3 SLI Standardization 147
6.4 Enabling Logging 154
6.5 Teaching the Log Query Language 156
6.6 Defining Initial SLOs 157
6.7 Default SLOs 163
6.8 Providing Basic Infrastructure 164
6.9 Engaging Champions 167
6.10 Dealing with Detractors 168
6.11 Creating Documentation 171
6.12 Broadcast Success 172
6.13 Summary 174

Chapter 7: Reacting to Alerts on SLO Breaches 175
7.1 Environment Selection 175
7.2 Responsibilities 177
7.3 Ways of Working 180
7.4 Setting Up On-Call Rotations 185
7.5 On-Call Management Tools 188
7.6 Out-of-Hours On-Call 193
7.7 Systematic Knowledge Sharing 196
7.8 Broadcast Success 208
7.9 Summary 209

Chapter 8: Implementing Alert Dispatching 211
8.1 Alert Escalation 212
8.2 Defining an Alert Escalation Policy 214
8.3 Defining Stakeholder Groups 216
8.4 Triggering Stakeholder Notifications 218
8.5 Defining Stakeholder Rings 219
8.6 Defining Effective Stakeholder Notifications 222
8.7 Getting the Stakeholders Subscribed 225
8.8 Broadcast Success 226
8.9 Summary 227

Chapter 9: Implementing Incident Response 229
9.1 Incident Response Foundations 229
9.2 Incident Priorities 230
9.3 Complex Incident Coordination 248
9.4 Incident Postmortems 268
9.5 Effective Postmortem Criteria 269
9.6 Mashing Up the Tools 294
9.7 Service Status Broadcast 298
9.8 Documenting the Incident Response Process 301
9.9 Broadcast Success 302
9.10 Summary 303

Chapter 10: Setting Up an Error Budget Policy 305
10.1 Motivation 305
10.2 Terminology 307
10.3 Error Budget Policy Structure 308
10.4 Error Budget Policy Conditions 309
10.5 Error Budget Policy Consequences 311
10.6 Error Budget Policy Governance 312
10.7 Extending the Error Budget Policy 314
10.8 Agreeing to the Error Budget Policy 318
10.9 Storing the Error Budget Policy 319
10.10 Enacting the Error Budget Policy 320
10.11 Reviewing the Error Budget Policy 321
10.12 Related Concepts 322
10.13 Summary 324

Chapter 11: Enabling Error Budget–Based Decision-Making 325
11.1 Reliability Decision-Making Taxonomy 325
11.2 Implementing SRE Indicators 330
11.3 Process Indicators, Not People KPIs 359
11.4 Decisions Versus Indicators 359
11.5 Decision-Making Workflows 362
11.6 Summary 388

Chapter 12: Implementing Organizational Structure 391
12.1 SRE Principles Versus Organizational Structure 393
12.2 Who Builds It, Who Runs It? 394
12.3 You Build It, You Run It 403
12.4 You Build It, You and SRE Run It 406
12.5 You Build It, SRE Run It 421
12.6 Cost Optimization 424
12.7 Team Topologies 426
12.8 Choosing a Model 432
12.9 A New Role: SRE 440
12.10 SRE Career Path 450
12.11 Communicating the Chosen Model 456
12.12 Introducing the Chosen Model 457
12.13 Summary 462

Part III: Measuring and Sustaining the Transformation 465

Chapter 13: Measuring the SRE Transformation 467
13.1 Testing Transformation Hypotheses 467
13.2 Outages Not Detected Internally 469
13.3 Services Exhausting Error Budgets Prematurely 470
13.4 Executives' Perceptions 471
13.5 Reliability Perception by Users and Partners 472
13.6 Summary 473

Chapter 14: Sustaining the SRE Movement 475
14.1 Maturing the SRE CoP 475
14.2 SRE Minutes 475
14.3 Availability Newsletter 476
14.4 SRE Column in the Engineering Blog 477
14.5 Promote Long-Form SRE Wiki Articles 477
14.6 SRE Broadcasting 478
14.7 Combining SRE and CD Indicators 479
14.8 SRE Feedback Loops 483
14.9 New Hypotheses 484
14.10 Providing Learning Opportunities 486
14.11 Supporting SRE Coaches 487
14.12 Summary 489

Chapter 15: The Road Ahead 491
15.1 Service Catalog 492
15.2 SLAs 494
15.3 Regulatory Compliance 494
15.4 SRE Infrastructure 495
15.5 Game Days 496

Appendix: Topics for Quick Reference 499

Index 507

Establishing SRE Foundations

Product form

£35.14

Includes FREE delivery

RRP £36.99 – you save £1.85 (5%)

Order before 4pm today for delivery by Tue 23 Dec 2025.

A Paperback / softback by Vladyslav Ukis

15 in stock


    View other formats and editions of Establishing SRE Foundations by Vladyslav Ukis

    Publisher: Pearson Education (US)
    Publication Date: 10/11/2022
    ISBN13: 9780137424603, 978-0137424603
    ISBN10: 137424604

    Description

    Book Synopsis

    Dr. Vladyslav Ukis is head of R&D for the Siemens Healthineers teamplay digital health platform and reliability lead for all Siemens Healthineers Digital Health products. Previously, as software development lead, he drove Continuous Delivery, SRE, and DevRel transformation, helping this large distributed development organization evolve architecture, deployment, testing, operations, and culture to implement these new processes at scale.



    Trade Review

    "Many enterprises today face the challenge of establishing modern operations for their SaaS offerings. This book provides a proven step-by-step guide for how this can be done from scratch using Google's SRE methodology. From achieving organizational buy-in to laying down the basic SRE foundations, establishing incident response and implementing a suitable organizational structure--the book contains a wealth of advice for development, operations, and leadership teams!"
    --Dr. Peter Schardt, Chief Technology Officer at Siemens Healthcare GmbH

    "Establishing SRE Foundations is a great introductory guide for anyone new to understanding and implementing Site Reliability Engineering (SRE) in their organization. Vlad creates a solid platform for anyone wishing to understand the SRE approach to building reliability into software services. As well as practical advice on implementing techniques such as SLIs and SLOs, Vlad goes into detail on how to achieve buy-in for SRE adoption and how to modify your organizational setup, rooted in his own experiences of working in a large organization. Those experiences are sorely lacking elsewhere in SRE literature, and when I'm asked in the future about SRE, I'll be referring people to this excellent book."
    --Steve Smith, author of Measuring Continuous Delivery (2020)

    "I very much enjoyed reading this book, even in its early forms. Vlad treats the topic of SRE methodically and in great detail; if you have ever been wondering whether or not someone else has come across your particular issue in an SRE implementation, this book can answer that question and probably has an actionable solution as well. Destined to become a constantly referenced handbook by all those involved in SRE change projects."
    --Niall Murphy, co-author of Site Reliability Engineering (2016) and The Site Reliability Handbook (2018)

    "There are an overwhelming number of blogs, books, podcasts, and ad hoc opinions covering the nitty-gritty of SRE toolchains and technology choices. That being said, SRE initiatives rarely fail for technological reasons--they fail for structural or organizational reasons. In Establishing SRE Foundations, Dr. Ukis has given us all a detailed, accessible, and actionable blueprint for the structures and practices of a successful SRE organization. It is an excellent book and one I would recommend to anyone looking to establish a scaled-out SRE practice in a complex environment."
    --Ben Sigelman, co-founder of Lightstep

    "Establishing SRE Foundations provides far and away the clearest, most comprehensive, and most actionable roadmap I have seen for driving, scaling, and sustaining SRE in an engineering organization. I cannot recommend it highly enough!"
    --Randy Shoup, eBay Chief Architect and former Google Engineering leader

    "Establishing SRE Foundations is a comprehensive guide for anyone looking to take their software operations to the next level. If you are a beginner, you will learn why SRE is a great methodology for improving operations, what the challenges of introducing SRE are, how to achieve organizational buyin for SRE, how to lay the foundation for SRE in your teams, and how to drive continuous improvement. If you are an experienced practitioner, you will learn how to set up an error budget policy, enable error budget–based decision-making, and implement a suitable organizational structure. I think the content of the book is spot on and highly recommend it!"
    --Vitor dos Reis, Director of Software Engineering at Delivery Hero

    "Vlad offers a detailed and comprehensive overview of the transformation to SRE. He covers assessment, organizational structures, technical implementation, communication, and continuation. This book is a clear roadmap for any organization starting or progressing their SRE journey, replete with what to consider, options available, and real-world examples. If you are thinking about starting the SRE journey, have found yourself stalled along the way, or are looking for more ideas to help you continue the journey successfully, then buy this book."
    --Doc Norton, Change Catalyst, OnBelay Consulting



    Table of Contents

    Foreword xxi
    Preface xxv
    Acknowledgments xxix
    About the Author xxxiii

    Part I: Foundations 1

    Chapter 1: Introduction to SRE 3
    1.1 Why SRE? 3
    1.2 Alignment Using SRE 13
    1.3 Why Does SRE Work? 17
    1.4 Summary 19

    Chapter 2: The Challenge 21
    2.1 Misalignment 22
    2.2 Collective Ownership 23
    2.3 Ownership Using SRE 25
    2.4 The Challenge Statement 38
    2.5 Coaching 39
    2.6 Summary 41

    Chapter 3: SRE Basic Concepts 43
    3.1 Service Level Indicators 43
    3.2 Service Level Objectives 45
    3.3 Error Budgets 47
    3.4 Error Budget Policies 53
    3.5 SRE Concept Pyramid 55
    3.6 Alignment Using the SRE Concept Pyramid 59
    3.7 Summary 63

    Chapter 4: Assessing the Status Quo 65
    4.1 Where Is the Organization? 65
    4.2 Where Are the People? 69
    4.3 Where Is the Tech? 71
    4.4 Where Is the Culture? 74
    4.5 Where Is the Process? 79
    4.6 SRE Maturity Model 81
    4.7 Posing Hypotheses 81
    4.8 Summary 86

    Part II: Running the Transformation 87

    Chapter 5: Achieving Organizational Buy-In 89
    5.1 Getting People Behind SRE 89
    5.2 SRE Marketing Funnel 92
    5.3 SRE Coaches 96
    5.4 Top-Down Buy-In 99
    5.5 Bottom-Up Buy-In 117
    5.6 Lateral Buy-In 122
    5.7 Buy-In Staggering 123
    5.8 Team Coaching 124
    5.9 Traversing the Organization 126
    5.10 Organizational Coaching 131
    5.11 Summary 133

    Chapter 6: Laying Down the Foundations 135
    6.1 Introductory Talks by Team 135
    6.2 Conveying the Basics 136
    6.3 SLI Standardization 147
    6.4 Enabling Logging 154
    6.5 Teaching the Log Query Language 156
    6.6 Defining Initial SLOs 157
    6.7 Default SLOs 163
    6.8 Providing Basic Infrastructure 164
    6.9 Engaging Champions 167
    6.10 Dealing with Detractors 168
    6.11 Creating Documentation 171
    6.12 Broadcast Success 172
    6.13 Summary 174

    Chapter 7: Reacting to Alerts on SLO Breaches 175
    7.1 Environment Selection 175
    7.2 Responsibilities 177
    7.3 Ways of Working 180
    7.4 Setting Up On-Call Rotations 185
    7.5 On-Call Management Tools 188
    7.6 Out-of-Hours On-Call 193
    7.7 Systematic Knowledge Sharing 196
    7.8 Broadcast Success 208
    7.9 Summary 209

    Chapter 8: Implementing Alert Dispatching 211
    8.1 Alert Escalation 212
    8.2 Defining an Alert Escalation Policy 214
    8.3 Defining Stakeholder Groups 216
    8.4 Triggering Stakeholder Notifications 218
    8.5 Defining Stakeholder Rings 219
    8.6 Defining Effective Stakeholder Notifications 222
    8.7 Getting the Stakeholders Subscribed 225
    8.8 Broadcast Success 226
    8.9 Summary 227

    Chapter 9: Implementing Incident Response 229
    9.1 Incident Response Foundations 229
    9.2 Incident Priorities 230
    9.3 Complex Incident Coordination 248
    9.4 Incident Postmortems 268
    9.5 Effective Postmortem Criteria 269
    9.6 Mashing Up the Tools 294
    9.7 Service Status Broadcast 298
    9.8 Documenting the Incident Response Process 301
    9.9 Broadcast Success 302
    9.10 Summary 303

    Chapter 10: Setting Up an Error Budget Policy 305
    10.1 Motivation 305
    10.2 Terminology 307
    10.3 Error Budget Policy Structure 308
    10.4 Error Budget Policy Conditions 309
    10.5 Error Budget Policy Consequences 311
    10.6 Error Budget Policy Governance 312
    10.7 Extending the Error Budget Policy 314
    10.8 Agreeing to the Error Budget Policy 318
    10.9 Storing the Error Budget Policy 319
    10.10 Enacting the Error Budget Policy 320
    10.11 Reviewing the Error Budget Policy 321
    10.12 Related Concepts 322
    10.13 Summary 324

    Chapter 11: Enabling Error Budget–Based Decision-Making 325
    11.1 Reliability Decision-Making Taxonomy 325
    11.2 Implementing SRE Indicators 330
    11.3 Process Indicators, Not People KPIs 359
    11.4 Decisions Versus Indicators 359
    11.5 Decision-Making Workflows 362
    11.6 Summary 388

    Chapter 12: Implementing Organizational Structure 391
    12.1 SRE Principles Versus Organizational Structure 393
    12.2 Who Builds It, Who Runs It? 394
    12.3 You Build It, You Run It 403
    12.4 You Build It, You and SRE Run It 406
    12.5 You Build It, SRE Run It 421
    12.6 Cost Optimization 424
    12.7 Team Topologies 426
    12.8 Choosing a Model 432
    12.9 A New Role: SRE 440
    12.10 SRE Career Path 450
    12.11 Communicating the Chosen Model 456
    12.12 Introducing the Chosen Model 457
    12.13 Summary 462

    Part III: Measuring and Sustaining the Transformation 465

    Chapter 13: Measuring the SRE Transformation 467
    13.1 Testing Transformation Hypotheses 467
    13.2 Outages Not Detected Internally 469
    13.3 Services Exhausting Error Budgets Prematurely 470
    13.4 Executives' Perceptions 471
    13.5 Reliability Perception by Users and Partners 472
    13.6 Summary 473

    Chapter 14: Sustaining the SRE Movement 475
    14.1 Maturing the SRE CoP 475
    14.2 SRE Minutes 475
    14.3 Availability Newsletter 476
    14.4 SRE Column in the Engineering Blog 477
    14.5 Promote Long-Form SRE Wiki Articles 477
    14.6 SRE Broadcasting 478
    14.7 Combining SRE and CD Indicators 479
    14.8 SRE Feedback Loops 483
    14.9 New Hypotheses 484
    14.10 Providing Learning Opportunities 486
    14.11 Supporting SRE Coaches 487
    14.12 Summary 489

    Chapter 15: The Road Ahead 491
    15.1 Service Catalog 492
    15.2 SLAs 494
    15.3 Regulatory Compliance 494
    15.4 SRE Infrastructure 495
    15.5 Game Days 496

    Appendix: Topics for Quick Reference 499

    Index 507

    Recently viewed products

    © 2025 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account