Site Reliability Engineering : How Google Runs Production Systems
| Other Authors: | , , , |
|---|---|
| Format: | Book |
| Language: | English |
| Edition: | First edition |
| Subjects: |
Table of Contents:
- Part I Introduction
- 1. Introduction
- 2. The production environment at Google, from the viewpoint of an SRE
- Part II Principles
- 3. Embracing risk
- 4. Service level objectives
- 5. Eliminating toil
- 6. Monitoring distributed systems
- 7. The evolution of automation at google
- 8. Release engineering
- 9. Simplicity
- Part III
- 10. Practical alerting from timeseries data
- 11. Being on call
- 12. Effective troubleshooting
- 13. Emergency response
- 14. Managing incidents
- 15. Postmortem culture : learning from failure