SlideShare a Scribd company logo
1 of 11
Download to read offline
Martin Parm
Infrastructure Engineer, Monitoring team
Monitoring for
Distributed
Operational
Responsibility
Giving operational responsibility back to the feature teams
(read: developers) instead of having a monolithic SRE team
❖ Capacity planning
❖ Configuration management
❖ Monitoring, alerting and on-call
etc....
But we provide the tools and infrastructure for them!
Distributed Operational Responsibility
❖ Organizational Scalability - too frequent changes for a
monolithic SRE team to keep up
❖ Getting The Right Person(tm) on the problem faster
❖ Accountability - making the right people hurt
❖ Autonomy - feature teams make all their own planning
and decisions
So let’s talk about monitoring....
But... but why...????
❖ Developers need training, but not a new education
❖ Developers need autonomy, but will do stupid things
❖ Developers need to care about metrics and analytics,
but not the pipeline
So how does that affect tooling?
Human challenges
Alerting - What developers should care about
Metrics and events
Magic
monitoring
pipeline
Alerting
rules
Alerting - The reality
Apache Kafka
FFWD
Metrics and events
Other
stuff
Even
more
stuff
...but we provide several different abstraction levels
depending on complexity of the task
❖ Script hooks i.e. drop a script in a folder
❖ Python scripts using the Riemann library
❖ Talk directly to FFWD using a supported protocol
Developers collect their own metrics
....but we help them by providing....
❖ Continuous integration with integration tests
❖ Abstractions from externals like PagerDuty
❖ Shared common functionality
Developers write their own alerting rules
❖ We build monitoring as a platform with many levels of
entry
❖ Self-service is king!
❖ We spend a lot of our time teaching and talking rather
than typing
....and that’s a good thing!
Impact on the monitoring team
Distributed Operational Responsibility is work-
in-progress
❖ We don’t know if this will work well
❖ We will run into new problems
❖ We will keep changing the way we work
anyway
......and finally
Martin Parm
email: parmus@spotify.com
twitter: @parmus_dk
FFWD: https://github.com/spotify/ffwd
Thank you for your
time and patience!

More Related Content

Similar to Monitorama PDX - Monitoring for Distributed Operational Responsibility

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet
 
DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck VictorOps
 
Testing and DevOps Culture: Lessons Learned
Testing and DevOps Culture: Lessons LearnedTesting and DevOps Culture: Lessons Learned
Testing and DevOps Culture: Lessons LearnedLB Denker
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Mirco Hering
 
Building & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeBuilding & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeMeryemElMorabit
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonTharindu Weerasinghe
 
Investing in a good software factory and automating the build process
Investing in a good software factory and automating the build processInvesting in a good software factory and automating the build process
Investing in a good software factory and automating the build processNicolas Mas
 
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterTaking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterMatt Tesauro
 
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austin
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austinDev ops ci-ap-is-oh-my_security-gone-agile_ut-austin
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austinMatt Tesauro
 
TaskMan-Middleware 2011
TaskMan-Middleware 2011TaskMan-Middleware 2011
TaskMan-Middleware 2011Andrea Tino
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeSteve Mercier
 
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseChoosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseXebiaLabs
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprisealphydan
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Matt Tesauro
 
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]Websec México, S.C.
 
OpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOpsOpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOpsIvo Vachkov
 
Application Performance Monitoring
Application Performance MonitoringApplication Performance Monitoring
Application Performance MonitoringOlivier Gérardin
 
Observability für alle
Observability für alleObservability für alle
Observability für alleQAware GmbH
 
DevOps - A Purpose for an Institution.pdf
DevOps - A Purpose for an Institution.pdfDevOps - A Purpose for an Institution.pdf
DevOps - A Purpose for an Institution.pdfVishwas N
 

Similar to Monitorama PDX - Monitoring for Distributed Operational Responsibility (20)

PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
 
DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck
 
Testing and DevOps Culture: Lessons Learned
Testing and DevOps Culture: Lessons LearnedTesting and DevOps Culture: Lessons Learned
Testing and DevOps Culture: Lessons Learned
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 
Building & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscapeBuilding & sustaining a monitoring team in a multi-application landscape
Building & sustaining a monitoring team in a multi-application landscape
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & Python
 
Investing in a good software factory and automating the build process
Investing in a good software factory and automating the build processInvesting in a good software factory and automating the build process
Investing in a good software factory and automating the build process
 
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterTaking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
 
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austin
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austinDev ops ci-ap-is-oh-my_security-gone-agile_ut-austin
Dev ops ci-ap-is-oh-my_security-gone-agile_ut-austin
 
TaskMan-Middleware 2011
TaskMan-Middleware 2011TaskMan-Middleware 2011
TaskMan-Middleware 2011
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the EnterpriseChoosing Automation for DevOps & Continuous Delivery in the Enterprise
Choosing Automation for DevOps & Continuous Delivery in the Enterprise
 
Automation and machine learning in the enterprise
Automation and machine learning in the enterpriseAutomation and machine learning in the enterprise
Automation and machine learning in the enterprise
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016
 
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]
Echidna, sistema de respuesta a incidentes open source [GuadalajaraCON 2013]
 
OpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOpsOpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOps
 
Application Performance Monitoring
Application Performance MonitoringApplication Performance Monitoring
Application Performance Monitoring
 
NIPUN_RESUME
NIPUN_RESUMENIPUN_RESUME
NIPUN_RESUME
 
Observability für alle
Observability für alleObservability für alle
Observability für alle
 
DevOps - A Purpose for an Institution.pdf
DevOps - A Purpose for an Institution.pdfDevOps - A Purpose for an Institution.pdf
DevOps - A Purpose for an Institution.pdf
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Monitorama PDX - Monitoring for Distributed Operational Responsibility

  • 1. Martin Parm Infrastructure Engineer, Monitoring team Monitoring for Distributed Operational Responsibility
  • 2. Giving operational responsibility back to the feature teams (read: developers) instead of having a monolithic SRE team ❖ Capacity planning ❖ Configuration management ❖ Monitoring, alerting and on-call etc.... But we provide the tools and infrastructure for them! Distributed Operational Responsibility
  • 3. ❖ Organizational Scalability - too frequent changes for a monolithic SRE team to keep up ❖ Getting The Right Person(tm) on the problem faster ❖ Accountability - making the right people hurt ❖ Autonomy - feature teams make all their own planning and decisions So let’s talk about monitoring.... But... but why...????
  • 4. ❖ Developers need training, but not a new education ❖ Developers need autonomy, but will do stupid things ❖ Developers need to care about metrics and analytics, but not the pipeline So how does that affect tooling? Human challenges
  • 5. Alerting - What developers should care about Metrics and events Magic monitoring pipeline Alerting rules
  • 6. Alerting - The reality Apache Kafka FFWD Metrics and events Other stuff Even more stuff
  • 7. ...but we provide several different abstraction levels depending on complexity of the task ❖ Script hooks i.e. drop a script in a folder ❖ Python scripts using the Riemann library ❖ Talk directly to FFWD using a supported protocol Developers collect their own metrics
  • 8. ....but we help them by providing.... ❖ Continuous integration with integration tests ❖ Abstractions from externals like PagerDuty ❖ Shared common functionality Developers write their own alerting rules
  • 9. ❖ We build monitoring as a platform with many levels of entry ❖ Self-service is king! ❖ We spend a lot of our time teaching and talking rather than typing ....and that’s a good thing! Impact on the monitoring team
  • 10. Distributed Operational Responsibility is work- in-progress ❖ We don’t know if this will work well ❖ We will run into new problems ❖ We will keep changing the way we work anyway ......and finally
  • 11. Martin Parm email: parmus@spotify.com twitter: @parmus_dk FFWD: https://github.com/spotify/ffwd Thank you for your time and patience!