SlideShare a Scribd company logo
1 of 43
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Enterprise Kafka
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Why Am I Here?
 You want to find out what this “Kafka” thing is
 You’re running Kafka, but you want to go big
 You’re looking for some neat whizbangs
2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Clark Haskins
Todd Palino
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Who Are We?
 Kafka SRE at LinkedIn
 Site Reliability Engineering
– Administrators
– Architects
– Developers
 Keep the site running, always
4
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Overview
5
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What Is Kafka?
6
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What Is Kafka?
Broker
A
P0
A
P1
A
P0
7
Consumer
Producer
Zookeeper
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Attributes of a Kafka Cluster
 Disk Based
 Durable
 Scalable
 Low Latency
 Finite Retention
 NOT Idempotent (yet)
8
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
 Multiple Datacenters, Multiple Clusters
 Mirroring between clusters
 Message Types
– Metrics
– Tracking
– Queuing
 Data transport from applications to Hadoop, and back
9
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
10
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
 300+ Kafka brokers
 Over 18,000 topics
 140,000+ Partitions
 220 Billion messages per day
 40 Terabytes In
 160 Terabytes Out
 Peak Load
– 3.25 Million messages per second
– 5.5 Gigabits/sec Inbound
– 18 Gigabits/sec Outbound
11
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Challenges We Have Overcome
12
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Solutions
 Kafka is young…..we Influenced development
 Operations wizardry…
13
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Hyper Growth
 Need to expand clusters to keep up with site traffic, and then balance them.
14
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Adding brokers
15
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
a
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Adding a broker(with broker leveling)
16
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
A
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Logs vs. Metrics
 Logging data killed the metrics cluster
17
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Quality of Service with Kafka
18
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
A
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Deployment Nightmares
 Parallel deployment wasn’t possible so…
 Babysitting sequential deployments
19
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Easy deployments
 Kafka 0.8.1 makes sure the cluster is in a good state before shutting down
– If any brokers in the cluster have under replicated partitions, Kafka will not shut
down
– Kafka ensures that only 1 broker is in shutdown sequence at a time.
20
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Killing Zookeeper
 Consumer offset management done within Zookeeper
 Every consumer committing offsets every minute for every partition makes
ZK very unhappy.
21
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Zookeeper on SSD
22
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Monitoring
23
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Is Broken!
24
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Is Broken!
 Everything is Kafka’s fault first
 What is lag?
 Consumer Problems
– Application problems
– Kafka client problems
25
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
How Do We Sleep At Night?
 Educating Users
– Why lag is their fault
 Monitoring the Ecosystem
– Kafka Brokers
– Zookeeper
– Mirror Makers
– Audit
– REST Interfaces
 Week Over Week
26
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Cluster Health and Utilization
 Under replicated partitions
 Offline partitions
 Broker partition count
 Data size on disk
 Leader partition count
 Network utilization
27
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Zookeeper
 Ensemble availability
 Latency
 Outstanding requests
28
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Mirror Maker and Audit
 Mirror Maker
– Lag
– Dropped Messages
 Audit Consumer
– Lag
– Completeness check
 Audit UI
29
Producer
Cluster ClusterMM
MessagesMessage
Counts
Audit
Consumer
All
Messages
Audit
State
Audit
Consumer
Audit
UI
Audit
State
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Audit UI
30
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Audit UI
31
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Tuning
32
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Hardware and OS
 Kernel Tuning
– Swapping is Death
– Allow more dirty pages
– Allow less dirty cache
 Disk throughput
– More spindles
– Longer commit interval
33
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Java Virtual Machine
34
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Garbage Collection
35
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Garbage Collection
 Java 7, update 51
 Garbage First (G1) Collector
– Set the heap size
– Specify a target GC pause time
– Don’t set the New size
 GC Times
– Less than 15ms per second in GC
– Steady 20-22ms GC intervals
– Almost no full GC cycles (and only 200-400ms when it does)
36
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Closing
37
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What’s Coming in 0.8.2
 Consumer offsets in the broker
 Delete topic
 Further down the road
– New producer
– Improved producer API
38
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Upcoming Operational Work
 Learning to share
 Shrinking a cluster
 Cluster comparison
 Advanced monitoring
39
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
How Can You Get Involved?
 http://kafka.apache.org
 Join the mailing lists
– users@kafka.apache.org
 irc.freenode.net - #apache-kafka
 Contribute tools
40
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Talk To Us
 Kafka SREs at LinkedIn
– Clark Haskins
 https://www.linkedin.com/in/clarkhaskins
 chaskins@linkedin.com
– Todd Palino
 https://www.linkedin.com/in/toddpalino
 tpalino@linkedin.com
41
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Questions
42
Enterprise Kafka: Kafka as a Service

More Related Content

What's hot

Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesAll Things Open
 
Nano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerNano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerITCamp
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?NGINX, Inc.
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Jonghyun Lee
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...Natan Silnitsky
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...Karan Parikh
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentHostedbyConfluent
 
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDCloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDVMware Tanzu
 
Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19NGINX, Inc.
 
What's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerWhat's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerITCamp
 
Comparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesMirantis
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraITCamp
 
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 Building and Evolving a Dependency-Graph Based Microservice Architecture (La... Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...confluent
 
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...StreamNative
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestrationlarsfrancke
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesThiago Gutierri
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event SourcingPaolo Castagna
 

What's hot (20)

Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use Cases
 
Nano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerNano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas Maurer
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
 
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDCloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CD
 
Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19
 
What's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerWhat's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas Maurer
 
Comparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh Architectures
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian Widera
 
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 Building and Evolving a Dependency-Graph Based Microservice Architecture (La... Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestration
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip Bestpractices
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event Sourcing
 

Similar to Enterprise Kafka: Kafka as a Service

Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use casesIndrajeet Kumar
 
WebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaWebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaDialogic Inc.
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaNitin Kumar
 
Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Dialogic Inc.
 
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsContinuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsErwan Bornier
 
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchApp-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchInfostretch
 
Cisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Canada
 
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryConcevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryVMware Tanzu
 
Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Product of Things
 
INTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsINTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsapidays
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...HelpSystems
 
What does it take to be an architect
What does it take to be an architectWhat does it take to be an architect
What does it take to be an architectConstantine Slisenka
 
What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)Constantine Slisenka
 
Vbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsVbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsCisco DevNet
 
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsApigee | Google Cloud
 
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Canada
 

Similar to Enterprise Kafka: Kafka as a Service (20)

Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use cases
 
WebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaWebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: Media
 
Scribe Online CDK & Connector Development
Scribe Online CDK & Connector DevelopmentScribe Online CDK & Connector Development
Scribe Online CDK & Connector Development
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
 
Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4
 
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsContinuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
 
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchApp-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
 
Cisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful Technology
 
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryConcevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
 
Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...
 
INTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsINTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIs
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 
What does it take to be an architect
What does it take to be an architectWhat does it take to be an architect
What does it take to be an architect
 
What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)
 
Vbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsVbrownbag container networking for real workloads
Vbrownbag container networking for real workloads
 
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIs
 
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
 

More from Todd Palino

Leading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderLeading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderTodd Palino
 
From Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsFrom Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsTodd Palino
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino
 
Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Todd Palino
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
 
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Todd Palino
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum PainTodd Palino
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTodd Palino
 

More from Todd Palino (9)

Leading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderLeading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical Leader
 
From Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsFrom Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy Steps
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
 
Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Enterprise Kafka: Kafka as a Service

  • 1. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Enterprise Kafka
  • 2. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Why Am I Here?  You want to find out what this “Kafka” thing is  You’re running Kafka, but you want to go big  You’re looking for some neat whizbangs 2
  • 3. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Clark Haskins Todd Palino
  • 4. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Who Are We?  Kafka SRE at LinkedIn  Site Reliability Engineering – Administrators – Architects – Developers  Keep the site running, always 4
  • 5. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Overview 5
  • 6. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What Is Kafka? 6
  • 7. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What Is Kafka? Broker A P0 A P1 A P0 7 Consumer Producer Zookeeper
  • 8. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Attributes of a Kafka Cluster  Disk Based  Durable  Scalable  Low Latency  Finite Retention  NOT Idempotent (yet) 8
  • 9. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn  Multiple Datacenters, Multiple Clusters  Mirroring between clusters  Message Types – Metrics – Tracking – Queuing  Data transport from applications to Hadoop, and back 9
  • 10. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn 10
  • 11. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn  300+ Kafka brokers  Over 18,000 topics  140,000+ Partitions  220 Billion messages per day  40 Terabytes In  160 Terabytes Out  Peak Load – 3.25 Million messages per second – 5.5 Gigabits/sec Inbound – 18 Gigabits/sec Outbound 11
  • 12. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Challenges We Have Overcome 12
  • 13. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Solutions  Kafka is young…..we Influenced development  Operations wizardry… 13
  • 14. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Hyper Growth  Need to expand clusters to keep up with site traffic, and then balance them. 14
  • 15. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Adding brokers 15 Brokers Consumers Producers A P1 A P0 B P1 B P0 a P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 16. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Adding a broker(with broker leveling) 16 Brokers Consumers Producers A P1 A P0 B P1 B P0 A P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 17. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Logs vs. Metrics  Logging data killed the metrics cluster 17
  • 18. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Quality of Service with Kafka 18 Brokers Consumers Producers A P1 A P0 B P1 B P0 A P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 19. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Deployment Nightmares  Parallel deployment wasn’t possible so…  Babysitting sequential deployments 19
  • 20. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Easy deployments  Kafka 0.8.1 makes sure the cluster is in a good state before shutting down – If any brokers in the cluster have under replicated partitions, Kafka will not shut down – Kafka ensures that only 1 broker is in shutdown sequence at a time. 20
  • 21. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Killing Zookeeper  Consumer offset management done within Zookeeper  Every consumer committing offsets every minute for every partition makes ZK very unhappy. 21
  • 22. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Zookeeper on SSD 22
  • 23. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Monitoring 23
  • 24. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Is Broken! 24
  • 25. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Is Broken!  Everything is Kafka’s fault first  What is lag?  Consumer Problems – Application problems – Kafka client problems 25
  • 26. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. How Do We Sleep At Night?  Educating Users – Why lag is their fault  Monitoring the Ecosystem – Kafka Brokers – Zookeeper – Mirror Makers – Audit – REST Interfaces  Week Over Week 26
  • 27. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Cluster Health and Utilization  Under replicated partitions  Offline partitions  Broker partition count  Data size on disk  Leader partition count  Network utilization 27
  • 28. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Zookeeper  Ensemble availability  Latency  Outstanding requests 28
  • 29. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Mirror Maker and Audit  Mirror Maker – Lag – Dropped Messages  Audit Consumer – Lag – Completeness check  Audit UI 29 Producer Cluster ClusterMM MessagesMessage Counts Audit Consumer All Messages Audit State Audit Consumer Audit UI Audit State
  • 30. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Audit UI 30
  • 31. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Audit UI 31
  • 32. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Tuning 32
  • 33. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Hardware and OS  Kernel Tuning – Swapping is Death – Allow more dirty pages – Allow less dirty cache  Disk throughput – More spindles – Longer commit interval 33
  • 34. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Java Virtual Machine 34
  • 35. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Garbage Collection 35
  • 36. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Garbage Collection  Java 7, update 51  Garbage First (G1) Collector – Set the heap size – Specify a target GC pause time – Don’t set the New size  GC Times – Less than 15ms per second in GC – Steady 20-22ms GC intervals – Almost no full GC cycles (and only 200-400ms when it does) 36
  • 37. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Closing 37
  • 38. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What’s Coming in 0.8.2  Consumer offsets in the broker  Delete topic  Further down the road – New producer – Improved producer API 38
  • 39. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Upcoming Operational Work  Learning to share  Shrinking a cluster  Cluster comparison  Advanced monitoring 39
  • 40. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. How Can You Get Involved?  http://kafka.apache.org  Join the mailing lists – users@kafka.apache.org  irc.freenode.net - #apache-kafka  Contribute tools 40
  • 41. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Talk To Us  Kafka SREs at LinkedIn – Clark Haskins  https://www.linkedin.com/in/clarkhaskins  chaskins@linkedin.com – Todd Palino  https://www.linkedin.com/in/toddpalino  tpalino@linkedin.com 41
  • 42. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Questions 42