Site Reliability Engineering : How Google Runs Production Systems

Bibliographic Details
Other Authors: Beyer, Betsy (Editor), Jones, Chris (Editor), Murphy, Niall Richard (Editor), Petoff, Jennifer (Editor)
Format: Book
Language:English
Edition:First edition
Subjects:
Table of Contents:
  • Part I Introduction
  • 1. Introduction
  • 2. The production environment at Google, from the viewpoint of an SRE
  • Part II Principles
  • 3. Embracing risk
  • 4. Service level objectives
  • 5. Eliminating toil
  • 6. Monitoring distributed systems
  • 7. The evolution of automation at google
  • 8. Release engineering
  • 9. Simplicity
  • Part III
  • 10. Practical alerting from timeseries data
  • 11. Being on call
  • 12. Effective troubleshooting
  • 13. Emergency response
  • 14. Managing incidents
  • 15. Postmortem culture : learning from failure