Description

Book Synopsis

A practical, step-by-step guide to designing world-class, high availability systems using both classical and DFSS reliability techniques

Whether designing telecom, aerospace, automotive, medical, financial, or public safety systems, every engineer aims for the utmost reliability and availability in the systems he, or she, designs. But between the dream of world-class performance and reality falls the shadow of complexities that can bedevil even the most rigorous design process. While there are an array of robust predictive engineering tools, there has been no single-source guide to understanding and using them . . . until now.

Offering a case-based approach to designing, predicting, and deploying world-class high-availability systems from the ground up, this book brings together the best classical and DFSS reliability techniques. Although it focuses on technical aspects, this guide considers the business and market constraints that require that systems be design

Table of Contents
Preface xiii

List of Abbreviations xvii

1. Introduction 1

2. Initial Considerations for Reliability Design 3

2.1 The Challenge 3

2.2 Initial Data Collection 3

2.3 Where Do We Get MTBF Information? 5

2.4 MTTR and Identifying Failures 6

2.5 Summary 7

3. A Game of Dice: An Introduction to Probability 8

3.1 Introduction 8

3.2 A Game of Dice 10

3.3 Mutually Exclusive and Independent Events 10

3.4 Dice Paradox Problem and Conditional Probability 15

3.5 Flip a Coin 21

3.6 Dice Paradox Revisited 23

3.7 Probabilities for Multiple Dice Throws 24

3.8 Conditional Probability Revisited 27

3.9 Summary 29

4. Discrete Random Variables 30

4.1 Introduction 30

4.2 Random Variables 31

4.3 Discrete Probability Distributions 33

4.4 Bernoulli Distribution 34

4.5 Geometric Distribution 35

4.6 Binomial Coeffi cients 38

4.7 Binomial Distribution 40

4.8 Poisson Distribution 43

4.9 Negative Binomial Random Variable 48

4.10 Summary 50

5. Continuous Random Variables 51

5.1 Introduction 51

5.2 Uniform Random Variables 52

5.3 Exponential Random Variables 53

5.4 Weibull Random Variables 54

5.5 Gamma Random Variables 55

5.6 Chi-Square Random Variables 59

5.7 Normal Random Variables 59

5.8 Relationship between Random Variables 60

5.9 Summary 61

6. Random Processes 62

6.1 Introduction 62

6.2 Markov Process 63

6.3 Poisson Process 63

6.4 Deriving the Poisson Distribution 64

6.5 Poisson Interarrival Times 69

6.6 Summary 71

7. Modeling and Reliability Basics 72

7.1 Introduction 72

7.2 Modeling 75

7.3 Failure Probability and Failure Density 77

7.4 Unreliability, F(t) 78

7.5 Reliability, R(t) 79

7.6 MTTF 79

7.7 MTBF 79

7.8 Repairable System 80

7.9 Nonrepairable System 80

7.10 MTTR 80

7.11 Failure Rate 81

7.12 Maintainability 81

7.13 Operability 81

7.14 Availability 82

7.15 Unavailability 84

7.16 Five 9s Availability 85

7.17 Downtime 85

7.18 Constant Failure Rate Model 85

7.19 Conditional Failure Rate 88

7.20 Bayes’s Theorem 94

7.21 Reliability Block Diagrams 98

7.22 Summary 107

8. Discrete-Time Markov Analysis 110

8.1 Introduction 110

8.2 Markov Process Defined 112

8.3 Dynamic Modeling 116

8.4 Discrete Time Markov Chains 116

8.5 Absorbing Markov Chains 123

8.6 Nonrepairable Reliability Models 129

8.7 Summary 140

9. Continuous-Time Markov Systems 141

9.1 Introduction 141

9.2 Continuous-Time Markov Processes 141

9.3 Two-State Derivation 143

9.4 Steps to Create a Markov Reliability Model 147

9.5 Asymptotic Behavior (Steady-State Behavior) 148

9.6 Limitations of Markov Modeling 154

9.7 Markov Reward Models 154

9.8 Summary 155

10. Markov Analysis: Nonrepairable Systems 156

10.1 Introduction 156

10.2 One Component, No Repair 156

10.3 Nonrepairable Systems: Parallel System with No Repair 165

10.4 Series System with No Repair: Two Identical Components 172

10.5 Parallel System with Partial Repair: Identical Components 176

10.6 Parallel System with No Repair: Nonidentical Components 183

10.7 Summary 192

11. Markov Analysis: Repairable Systems 193

11.1 Repairable Systems 193

11.2 One Component with Repair 194

11.3 Parallel System with Repair: Identical Component Failure and Repair Rates 204

11.4 Parallel System with Repair: Different Failure and Repair Rates 217

11.5 Summary 239

12. Analyzing Confidence Levels 240

12.1 Introduction 240

12.2 pdf of a Squared Normal Random Variable 240

12.3 pdf of the Sum of Two Random Variables 243

12.4 pdf of the Sum of Two Gamma Random Variables 245

12.5 pdf of the Sum of n Gamma Random Variables 246

12.6 Goodness-of-Fit Test Using Chi-Square 249

12.7 Confidence Levels 257

12.8 Summary 264

13. Estimating Reliability Parameters 266

13.1 Introduction 266

13.2 Bayes’ Estimation 268

13.3 Example of Estimating Hardware MTBF 273

13.4 Estimating Software MTBF 273

13.5 Revising Initial MTBF Estimates and Tradeoffs 274

13.6 Summary 277

14. Six Sigma Tools for Predictive Engineering 278

14.1 Introduction 278

14.2 Gathering Voice of Customer (VOC) 279

14.3 Processing Voice of Customer 281

14.4 Kano Analysis 282

14.5 Analysis of Technical Risks 284

14.6 Quality Function Deployment (QFD) or House of Quality 284

14.7 Program Level Transparency of Critical Parameters 287

14.8 Mapping DFSS Techniques to Critical Parameters 287

14.9 Critical Parameter Management (CPM) 287

14.10 First Principles Modeling 289

14.11 Design of Experiments (DOE) 289

14.12 Design Failure Modes and Effects Analysis (DFMEA) 289

14.13 Fault Tree Analysis 290

14.14 Pugh Matrix 290

14.15 Monte Carlo Simulation 291

14.16 Commercial DFSS Tools 291

14.17 Mathematical Prediction of System Capability instead of “Gut Feel” 293

14.18 Visualizing System Behavior Early in the Life Cycle 297

14.19 Critical Parameter Scorecard 297

14.20 Applying DFSS in Third-Party Intensive Programs 298

14.21 Summary 300

15. Design Failure Modes and Effects Analysis 302

15.1 Introduction 302

15.2 What Is Design Failure Modes and Effects Analysis (DFMEA)? 302

15.3 Definitions 303

15.4 Business Case for DFMEA 303

15.5 Why Conduct DFMEA? 305

15.6 When to Perform DFMEA 305

15.7 Applicability of DFMEA 306

15.8 DFMEA Template 306

15.9 DFMEA Life Cycle 312

15.10 The DFMEA Team 324

15.11 DFMEA Advantages and Disadvantages 327

15.12 Limitations of DFMEA 328

15.13 DFMEAs, FTAs, and Reliability Analysis 328

15.14 Summary 330

16. Fault Tree Analysis 331

16.1 What Is Fault Tree Analysis? 331

16.2 Events 332

16.3 Logic Gates 333

16.4 Creating a Fault Tree 335

16.5 Fault Tree Limitations 339

16.6 Summary 339

17. Monte Carlo Simulation Models 340

17.1 Introduction 340

17.2 System Behavior over Mission Time 344

17.3 Reliability Parameter Analysis 344

17.4 A Worked Example 348

17.5 Component and System Failure Times Using Monte Carlo Simulations 359

17.6 Limitations of Using Nontime-Based Monte Carlo Simulations 361

17.7 Summary 365

18. Updating Reliability Estimates: Case Study 367

18.1 Introduction 367

18.2 Overview of the Base Station Controller—Data Only (BSC-DO) System 367

18.3 Downtime Calculation 368

18.4 Calculating Availability from Field Data Only 371

18.5 Assumptions Behind Using the Chi-Square Methodology 372

18.6 Fault Tree Updates from Field Data 372

18.7 Summary 376

19. Fault Management Architectures 377

19.1 Introduction 377

19.2 Faults, Errors, and Failures 378

19.3 Fault Management Design 381

19.4 Repair versus Recovery 382

19.5 Design Considerations for Reliability Modeling 383

19.6 Architecture Techniques to Improve Availability 383

19.7 Redundancy Schemes 384

19.8 Summary 395

20 Application of DFMEA to Real-Life Example 397

20.1 Introduction 397

20.2 Cage Failover Architecture Description 397

20.3 Cage Failover DFMEA Example 399

20.4 DFMEA Scorecard 401

20.5 Lessons Learned 402

20.6 Summary 403

21. Application of FTA to Real-Life Example 404

21.1 Introduction 404

21.2 Calculating Availability Using Fault Tree Analysis 404

21.3 Building the Basic Events 405

21.4 Building the Fault Tree 406

21.5 Steps for Creating and Estimating the Availability Using FTA 408

21.6 Summary 416

22. Complex High Availability System Analysis 420

22.1 Introduction 420

22.2 Markov Analysis of the Hardware Components 420

22.3 Building a Fault Tree from the Hardware Markov Model 427

22.4 Markov Analysis of the Software Components 427

22.5 Markov Analysis of the Combined Hardware and Software Components 433

22.6 Techniques for Simplifying Markov Analysis 437

22.7 Summary 446

References 447

Index 450

Designing High Availability Systems

    Product form

    £104.36

    Includes FREE delivery

    RRP £115.95 – you save £11.59 (9%)

    Order before 4pm tomorrow for delivery by Sat 4 Jul 2026.

    A Hardback by Zachary Taylor, Subramanyam Ranganathan

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Designing High Availability Systems by Zachary Taylor

      Publisher: John Wiley & Sons Inc
      Publication Date: 10/12/2013
      ISBN13: 9781118551127, 978-1118551127
      ISBN10: 1118551125

      Description

      Book Synopsis

      A practical, step-by-step guide to designing world-class, high availability systems using both classical and DFSS reliability techniques

      Whether designing telecom, aerospace, automotive, medical, financial, or public safety systems, every engineer aims for the utmost reliability and availability in the systems he, or she, designs. But between the dream of world-class performance and reality falls the shadow of complexities that can bedevil even the most rigorous design process. While there are an array of robust predictive engineering tools, there has been no single-source guide to understanding and using them . . . until now.

      Offering a case-based approach to designing, predicting, and deploying world-class high-availability systems from the ground up, this book brings together the best classical and DFSS reliability techniques. Although it focuses on technical aspects, this guide considers the business and market constraints that require that systems be design

      Table of Contents
      Preface xiii

      List of Abbreviations xvii

      1. Introduction 1

      2. Initial Considerations for Reliability Design 3

      2.1 The Challenge 3

      2.2 Initial Data Collection 3

      2.3 Where Do We Get MTBF Information? 5

      2.4 MTTR and Identifying Failures 6

      2.5 Summary 7

      3. A Game of Dice: An Introduction to Probability 8

      3.1 Introduction 8

      3.2 A Game of Dice 10

      3.3 Mutually Exclusive and Independent Events 10

      3.4 Dice Paradox Problem and Conditional Probability 15

      3.5 Flip a Coin 21

      3.6 Dice Paradox Revisited 23

      3.7 Probabilities for Multiple Dice Throws 24

      3.8 Conditional Probability Revisited 27

      3.9 Summary 29

      4. Discrete Random Variables 30

      4.1 Introduction 30

      4.2 Random Variables 31

      4.3 Discrete Probability Distributions 33

      4.4 Bernoulli Distribution 34

      4.5 Geometric Distribution 35

      4.6 Binomial Coeffi cients 38

      4.7 Binomial Distribution 40

      4.8 Poisson Distribution 43

      4.9 Negative Binomial Random Variable 48

      4.10 Summary 50

      5. Continuous Random Variables 51

      5.1 Introduction 51

      5.2 Uniform Random Variables 52

      5.3 Exponential Random Variables 53

      5.4 Weibull Random Variables 54

      5.5 Gamma Random Variables 55

      5.6 Chi-Square Random Variables 59

      5.7 Normal Random Variables 59

      5.8 Relationship between Random Variables 60

      5.9 Summary 61

      6. Random Processes 62

      6.1 Introduction 62

      6.2 Markov Process 63

      6.3 Poisson Process 63

      6.4 Deriving the Poisson Distribution 64

      6.5 Poisson Interarrival Times 69

      6.6 Summary 71

      7. Modeling and Reliability Basics 72

      7.1 Introduction 72

      7.2 Modeling 75

      7.3 Failure Probability and Failure Density 77

      7.4 Unreliability, F(t) 78

      7.5 Reliability, R(t) 79

      7.6 MTTF 79

      7.7 MTBF 79

      7.8 Repairable System 80

      7.9 Nonrepairable System 80

      7.10 MTTR 80

      7.11 Failure Rate 81

      7.12 Maintainability 81

      7.13 Operability 81

      7.14 Availability 82

      7.15 Unavailability 84

      7.16 Five 9s Availability 85

      7.17 Downtime 85

      7.18 Constant Failure Rate Model 85

      7.19 Conditional Failure Rate 88

      7.20 Bayes’s Theorem 94

      7.21 Reliability Block Diagrams 98

      7.22 Summary 107

      8. Discrete-Time Markov Analysis 110

      8.1 Introduction 110

      8.2 Markov Process Defined 112

      8.3 Dynamic Modeling 116

      8.4 Discrete Time Markov Chains 116

      8.5 Absorbing Markov Chains 123

      8.6 Nonrepairable Reliability Models 129

      8.7 Summary 140

      9. Continuous-Time Markov Systems 141

      9.1 Introduction 141

      9.2 Continuous-Time Markov Processes 141

      9.3 Two-State Derivation 143

      9.4 Steps to Create a Markov Reliability Model 147

      9.5 Asymptotic Behavior (Steady-State Behavior) 148

      9.6 Limitations of Markov Modeling 154

      9.7 Markov Reward Models 154

      9.8 Summary 155

      10. Markov Analysis: Nonrepairable Systems 156

      10.1 Introduction 156

      10.2 One Component, No Repair 156

      10.3 Nonrepairable Systems: Parallel System with No Repair 165

      10.4 Series System with No Repair: Two Identical Components 172

      10.5 Parallel System with Partial Repair: Identical Components 176

      10.6 Parallel System with No Repair: Nonidentical Components 183

      10.7 Summary 192

      11. Markov Analysis: Repairable Systems 193

      11.1 Repairable Systems 193

      11.2 One Component with Repair 194

      11.3 Parallel System with Repair: Identical Component Failure and Repair Rates 204

      11.4 Parallel System with Repair: Different Failure and Repair Rates 217

      11.5 Summary 239

      12. Analyzing Confidence Levels 240

      12.1 Introduction 240

      12.2 pdf of a Squared Normal Random Variable 240

      12.3 pdf of the Sum of Two Random Variables 243

      12.4 pdf of the Sum of Two Gamma Random Variables 245

      12.5 pdf of the Sum of n Gamma Random Variables 246

      12.6 Goodness-of-Fit Test Using Chi-Square 249

      12.7 Confidence Levels 257

      12.8 Summary 264

      13. Estimating Reliability Parameters 266

      13.1 Introduction 266

      13.2 Bayes’ Estimation 268

      13.3 Example of Estimating Hardware MTBF 273

      13.4 Estimating Software MTBF 273

      13.5 Revising Initial MTBF Estimates and Tradeoffs 274

      13.6 Summary 277

      14. Six Sigma Tools for Predictive Engineering 278

      14.1 Introduction 278

      14.2 Gathering Voice of Customer (VOC) 279

      14.3 Processing Voice of Customer 281

      14.4 Kano Analysis 282

      14.5 Analysis of Technical Risks 284

      14.6 Quality Function Deployment (QFD) or House of Quality 284

      14.7 Program Level Transparency of Critical Parameters 287

      14.8 Mapping DFSS Techniques to Critical Parameters 287

      14.9 Critical Parameter Management (CPM) 287

      14.10 First Principles Modeling 289

      14.11 Design of Experiments (DOE) 289

      14.12 Design Failure Modes and Effects Analysis (DFMEA) 289

      14.13 Fault Tree Analysis 290

      14.14 Pugh Matrix 290

      14.15 Monte Carlo Simulation 291

      14.16 Commercial DFSS Tools 291

      14.17 Mathematical Prediction of System Capability instead of “Gut Feel” 293

      14.18 Visualizing System Behavior Early in the Life Cycle 297

      14.19 Critical Parameter Scorecard 297

      14.20 Applying DFSS in Third-Party Intensive Programs 298

      14.21 Summary 300

      15. Design Failure Modes and Effects Analysis 302

      15.1 Introduction 302

      15.2 What Is Design Failure Modes and Effects Analysis (DFMEA)? 302

      15.3 Definitions 303

      15.4 Business Case for DFMEA 303

      15.5 Why Conduct DFMEA? 305

      15.6 When to Perform DFMEA 305

      15.7 Applicability of DFMEA 306

      15.8 DFMEA Template 306

      15.9 DFMEA Life Cycle 312

      15.10 The DFMEA Team 324

      15.11 DFMEA Advantages and Disadvantages 327

      15.12 Limitations of DFMEA 328

      15.13 DFMEAs, FTAs, and Reliability Analysis 328

      15.14 Summary 330

      16. Fault Tree Analysis 331

      16.1 What Is Fault Tree Analysis? 331

      16.2 Events 332

      16.3 Logic Gates 333

      16.4 Creating a Fault Tree 335

      16.5 Fault Tree Limitations 339

      16.6 Summary 339

      17. Monte Carlo Simulation Models 340

      17.1 Introduction 340

      17.2 System Behavior over Mission Time 344

      17.3 Reliability Parameter Analysis 344

      17.4 A Worked Example 348

      17.5 Component and System Failure Times Using Monte Carlo Simulations 359

      17.6 Limitations of Using Nontime-Based Monte Carlo Simulations 361

      17.7 Summary 365

      18. Updating Reliability Estimates: Case Study 367

      18.1 Introduction 367

      18.2 Overview of the Base Station Controller—Data Only (BSC-DO) System 367

      18.3 Downtime Calculation 368

      18.4 Calculating Availability from Field Data Only 371

      18.5 Assumptions Behind Using the Chi-Square Methodology 372

      18.6 Fault Tree Updates from Field Data 372

      18.7 Summary 376

      19. Fault Management Architectures 377

      19.1 Introduction 377

      19.2 Faults, Errors, and Failures 378

      19.3 Fault Management Design 381

      19.4 Repair versus Recovery 382

      19.5 Design Considerations for Reliability Modeling 383

      19.6 Architecture Techniques to Improve Availability 383

      19.7 Redundancy Schemes 384

      19.8 Summary 395

      20 Application of DFMEA to Real-Life Example 397

      20.1 Introduction 397

      20.2 Cage Failover Architecture Description 397

      20.3 Cage Failover DFMEA Example 399

      20.4 DFMEA Scorecard 401

      20.5 Lessons Learned 402

      20.6 Summary 403

      21. Application of FTA to Real-Life Example 404

      21.1 Introduction 404

      21.2 Calculating Availability Using Fault Tree Analysis 404

      21.3 Building the Basic Events 405

      21.4 Building the Fault Tree 406

      21.5 Steps for Creating and Estimating the Availability Using FTA 408

      21.6 Summary 416

      22. Complex High Availability System Analysis 420

      22.1 Introduction 420

      22.2 Markov Analysis of the Hardware Components 420

      22.3 Building a Fault Tree from the Hardware Markov Model 427

      22.4 Markov Analysis of the Software Components 427

      22.5 Markov Analysis of the Combined Hardware and Software Components 433

      22.6 Techniques for Simplifying Markov Analysis 437

      22.7 Summary 446

      References 447

      Index 450

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account