SlideShare a Scribd company logo
1 of 36
Download to read offline
Storm at
Cleaning up fraudulent traffic on the internet
http://xkcd.com/570/
Ashley Brown
Chief Architect
Using Storm since
September 2011
Based in the West End
Founded in early 2010
Focused on (fighting)
advertising fraud since
2011
What I'll cover
This is a case study: how Storm fits into our
architecture and business
NO CODE
NO WORKER-QUEUE DIAGRAMS1
I assume you've seen them all before
Our Rules
Why did we pick Storm in the first place?
Use the right tool for the job
If a piece of software
isn't helping the
business, chuck it out
Only jump on a
bandwagon if it's
going in your direction
Douglas de Jager, CEO
Don't write it if you can download it
If there's an open
source project that
does what's needed,
use that
Sometimes this
means throwing out
home-grown code
when a new project is
released (e.g. Storm)
Ben Hodgson, Software Engineer
Our goals
Why do we turn up in the morning?
Find fraudulent website traffic
Collect billions of
server and client-side
signals per month
Sift, sort, analyse and
condense
Identify automated,
fraudulent and
suspicious traffic
Joe Tallett, Software Engineer
Protect against it
Give our customers
the information they
need to protect their
businesses from bad
traffic
This means: give
them clean data to
make business
decisions
Simon Overell, Chief Scientist
Expose it
Work with partners to
reveal the scale of the
problem
Drive people to
solutions which can
eliminate fraud
Cut it off
Eliminate bad traffic
sources by cutting off
their revenue
Build reactive
solutions that stop
fraud being profitable
Vegard Johnsen, CCO
Storm & our timeline
Storm solved a scaling problem
impressions/
month 60 million
60 millionsignals/month
60 million
240 million
1.5 billion
6 billion
Pre-history
(Summer 2010)
storm released present day
RabbitMQ queues
Python workers
Hypertable:
API datastore
Hadoop batch
analysis/
RabbitMQ queues
Python workers
VoltDB:
API datastore +
real-time joins
No batch analysis
(this setup was
pretty reliable!)
Custom cluster
management
+ worker scaling
RabbitMQ queues
Storm Topologies:
in-memory joins
HBase:
API datastore
Cascading for post-
failure restores
(too much data to do
without!)
billions and billions
and billions
billions and billions
and billions
Logging via CDN
Cascading for data
analysis
Hive for aggregations
High-level aggregates
in MySQL
1 2 3
Key moments
● Enter the advertising anti-fraud market
○ 30x increase in impression volume
○ Bursty traffic
○ Existing queue + worker system not robust enough
○ I can do without the 2am wake up calls
● Enter Storm.
1
RabbitMQ
Cluster
(Scalable)
Python worker
cluster2
(Scalable)
Server- and
client-side
signals
1) I lied about the queue/worker diagrams.
2) we have a bar in the office; our workers are happy.
VoltDB
Scalable at launch-
time only
What's wrong with that?
● Queue/worker scaling system relied on our
code; it worked, but:
○ Only 1, maybe 2 pairs of eyes had ever looked at it
○ Code becomes a maintenance liability as soon as it
is written
○ Writing infrastructure software not one of our goals
● Maintaining in-memory database for the
volume we wanted was not cost-effective
(mainly due to AWS memory costs)
● Couldn't scale dynamically: full DB cluster
restart required
The Solution
● Migrate our internal event-stream-based
workers to Storm
○ Whole community to check and maintain code
● Move to HBase for long-term API datastore
○ Keep data for longer - better trend decisions
● VoltDB joins → in-memory joins & HBase
○ small in-memory join window, then flushed
○ full 15-minute join achieved by reading from HBase
○ Trident solves this now - wasn't around then
RabbitMQ
Cluster
(Scalable)
Storm cluster
(Scalable)
Server- and
client-side
signals
HBase
(Scalable)
Cascading on
Amazon Elastic
MapReduce
From logs EMERGENCY
RESTORE
How long (Storm migration)?
● 17 Sept 2011: Released
● 21 Sept 2011: Test cluster processing
● 29 Sept 2011: Substantial implementation of
core workers
● 30 Sept 2011: Python workers running
under Storm control
● Total engineers: 1
The Results (redacted)
● Classifications
available within 15
minutes
● Dashboard
provides overview
of 'legitimate' vs
other traffic
● Better data on
which to make
business decisions
Lessons
● Storm is easy to install & run
● First iteration: use Storm for control and
scaling of existing queue+worker systems
● Second iteration: use Storm to provide
redundancy via acking/replays
● Third iteration: remove intermediate
queues to realise performance benefits
A Quick Aside on DRPC
● Our initial API implementation in HBase was
slow
● Large number of partial aggregates to
consume, all handled by a single process
● Storm's DRPC provided a 10x speedup -
machines across the cluster pulled partials
from HBase, generated 'mega-partials'; final
step as a reducer => final totals.
Storm solved a scaling problem
impressions/
month 60 million
60 millionsignals/month
60 million
240 million
1.5 billion
6 billion
Pre-history
(Summer 2010)
storm released present day
RabbitMQ queues
Python workers
Hypertable:
API datastore
Hadoop batch
analysis/
RabbitMQ queues
Python workers
VoltDB:
API datastore +
real-time joins
No batch analysis
(this setup was
pretty reliable!)
Custom cluster
management
+ worker scaling
RabbitMQ queues
Storm Topologies:
in-memory joins
HBase:
API datastore
Cascading for post-
failure restores
(too much data to do
without!)
billions and billions
and billions
billions and billions
and billions
Logging via CDN
Cascading for data
analysis
Hive for aggregations
High-level aggregates
in MySQL
1 2 3
Key moments
● Enabled across substantial internet ad
inventory
○ 10x increase in impression volume
○ Low-latency, always-up requirements
○ Competitive marketplace
● Exit Storm.
2
What happened?
● Stopped upgrading at 0.6.2
○ Big customers unable to use real-time data at time
○ An unnecessary cost
○ Batch options provided better resiliency and cost
profile
● Too expensive to provide very low-latency
data collection in a compatible way
● Legacy systems continue to run...
How reliable?
● Legacy topologies still running:
CDN
RabbitMQ
Cluster
(Scalable)
Storm cluster
(Scalable)
Server- and
client-side
signals
HBase
(Scalable)
Cascading on
Amazon
Elastic
MapReduce
From logs
LEGACY
Hive on EMR
(Aggregate
Generation)
Bulk
Export
The Results
● Identification of a botnet cluster attracting
international press
● Many other sources of fraud under active
investigation
● Using Amazon EC2 spot instances for batch
analysis when cheapest - not paying for
always-up
The Results
Lessons
● Benefit of real-time processing is a business
decision - batch may be more cost effective
● Storm is easy and reliable to use, but you
need supporting infrastructure around it (e.g.
queue servers)
● It may be the supporting infrastructure that
gives you problems...
Storm solved a scaling problem
impressions/
month 60 million
60 millionsignals/month
60 million
240 million
1.5 billion
6 billion
Pre-history
(Summer 2010)
storm released present day
RabbitMQ queues
Python workers
Hypertable:
API datastore
Hadoop batch
analysis/
RabbitMQ queues
Python workers
VoltDB:
API datastore +
real-time joins
No batch analysis
(this setup was
pretty reliable!)
Custom cluster
management
+ worker scaling
RabbitMQ queues
Storm Topologies:
in-memory joins
HBase:
API datastore
Cascading for post-
failure restores
(too much data to do
without!)
billions and billions
and billions
billions and billions
and billions
Logging via CDN
Cascading for data
analysis
Hive for aggregations
High-level aggregates
in MySQL
1 2 3
Key moments
● Arms race begins
○ Fraudsters in control of large botnets able to
respond quickly
○ Source and signatures of fraud will change faster
and faster in the future, as we close off more
avenues
○ Growing demand for more immediate classifications
than provided by batch-only
● Welcome Back Storm.
3
What now?
● Returning to Storm, paired with Mahout
● Crunching billions and billions of impressions
using Cascading + Mahout
● Real-time response using Trident + Mahout
○ Known-bad signatures identify new botnet IPs,
suspect publishers
○ Online learning adapts models to emerging threats
Lessons
● As your business changes, your architecture
must change
● Choose Storm if:
○ you have existing ad-hoc event streaming systems
that could use more resiliency
○ your business needs a new real-time analysis
component that fits an event-streaming model
○ you're happy to run appropriate infrastructure around
it
● Don't choose Storm if:
○ you have no use for real-time data
○ you only want to use it because it's cool
More Lessons
● Using Cascading for Hadoop jobs and Storm
for real-time is REALLY handy
○ Retains event-streaming paradigm
○ No need to completely re-think implementation when
switching between them
○ In some circumstances can share code
○ We have a library which provides common analysis
components for both implementations
● A reasonably managed Storm cluster will
stay up for ages.
http://xkcd.com/749/

More Related Content

What's hot

Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Spark Streaming and Expert Systems
Spark Streaming and Expert SystemsSpark Streaming and Expert Systems
Spark Streaming and Expert SystemsJim Haughwout
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Grier Johnson
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talkDanny Yuan
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERShuyi Chen
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in PracticeC4Media
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceLN Renganarayana
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talkDanny Yuan
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 

What's hot (20)

Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Spark Streaming and Expert Systems
Spark Streaming and Expert SystemsSpark Streaming and Expert Systems
Spark Streaming and Expert Systems
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talk
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talk
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 

Viewers also liked

лекция1
лекция1лекция1
лекция1GARCH
 
June 2014 cpi leveraging your brand online anne pryor june 2014
June 2014 cpi leveraging your brand online anne pryor june 2014June 2014 cpi leveraging your brand online anne pryor june 2014
June 2014 cpi leveraging your brand online anne pryor june 2014ANNE PRYOR, MA
 
Than thoai hy lap 599
Than thoai hy lap 599Than thoai hy lap 599
Than thoai hy lap 599Tun Heo
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 

Viewers also liked (12)

лекция1
лекция1лекция1
лекция1
 
June 2014 cpi leveraging your brand online anne pryor june 2014
June 2014 cpi leveraging your brand online anne pryor june 2014June 2014 cpi leveraging your brand online anne pryor june 2014
June 2014 cpi leveraging your brand online anne pryor june 2014
 
Anatomias
AnatomiasAnatomias
Anatomias
 
Wellness Webinar
Wellness WebinarWellness Webinar
Wellness Webinar
 
Than thoai hy lap 599
Than thoai hy lap 599Than thoai hy lap 599
Than thoai hy lap 599
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 

Similar to Storm at spider.io - London Storm Meetup 2013-06-18

Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlSascha Oehl
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...Dataconomy Media
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - CriteoSofian Djamaa
 
Nanog66 vicente de luca fast netmon
Nanog66 vicente de luca fast netmonNanog66 vicente de luca fast netmon
Nanog66 vicente de luca fast netmonPavel Odintsov
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical archYan Zhang
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive MicroservicesRick Hightower
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014Michal Harish
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
 
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...João Parreira
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 

Similar to Storm at spider.io - London Storm Meetup 2013-06-18 (20)

Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha Oehl
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - Criteo
 
Nanog66 vicente de luca fast netmon
Nanog66 vicente de luca fast netmonNanog66 vicente de luca fast netmon
Nanog66 vicente de luca fast netmon
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical arch
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...
Building collaborative HTML5 apps using a backend-as-a-service (HTML5DevConf ...
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Storm at spider.io - London Storm Meetup 2013-06-18

  • 1. Storm at Cleaning up fraudulent traffic on the internet http://xkcd.com/570/
  • 2. Ashley Brown Chief Architect Using Storm since September 2011 Based in the West End Founded in early 2010 Focused on (fighting) advertising fraud since 2011
  • 3. What I'll cover This is a case study: how Storm fits into our architecture and business NO CODE NO WORKER-QUEUE DIAGRAMS1 I assume you've seen them all before
  • 4. Our Rules Why did we pick Storm in the first place?
  • 5. Use the right tool for the job If a piece of software isn't helping the business, chuck it out Only jump on a bandwagon if it's going in your direction Douglas de Jager, CEO
  • 6. Don't write it if you can download it If there's an open source project that does what's needed, use that Sometimes this means throwing out home-grown code when a new project is released (e.g. Storm) Ben Hodgson, Software Engineer
  • 7. Our goals Why do we turn up in the morning?
  • 8. Find fraudulent website traffic Collect billions of server and client-side signals per month Sift, sort, analyse and condense Identify automated, fraudulent and suspicious traffic Joe Tallett, Software Engineer
  • 9. Protect against it Give our customers the information they need to protect their businesses from bad traffic This means: give them clean data to make business decisions Simon Overell, Chief Scientist
  • 10. Expose it Work with partners to reveal the scale of the problem Drive people to solutions which can eliminate fraud
  • 11. Cut it off Eliminate bad traffic sources by cutting off their revenue Build reactive solutions that stop fraud being profitable Vegard Johnsen, CCO
  • 12. Storm & our timeline
  • 13. Storm solved a scaling problem impressions/ month 60 million 60 millionsignals/month 60 million 240 million 1.5 billion 6 billion Pre-history (Summer 2010) storm released present day RabbitMQ queues Python workers Hypertable: API datastore Hadoop batch analysis/ RabbitMQ queues Python workers VoltDB: API datastore + real-time joins No batch analysis (this setup was pretty reliable!) Custom cluster management + worker scaling RabbitMQ queues Storm Topologies: in-memory joins HBase: API datastore Cascading for post- failure restores (too much data to do without!) billions and billions and billions billions and billions and billions Logging via CDN Cascading for data analysis Hive for aggregations High-level aggregates in MySQL 1 2 3
  • 14. Key moments ● Enter the advertising anti-fraud market ○ 30x increase in impression volume ○ Bursty traffic ○ Existing queue + worker system not robust enough ○ I can do without the 2am wake up calls ● Enter Storm. 1
  • 15. RabbitMQ Cluster (Scalable) Python worker cluster2 (Scalable) Server- and client-side signals 1) I lied about the queue/worker diagrams. 2) we have a bar in the office; our workers are happy. VoltDB Scalable at launch- time only
  • 16. What's wrong with that? ● Queue/worker scaling system relied on our code; it worked, but: ○ Only 1, maybe 2 pairs of eyes had ever looked at it ○ Code becomes a maintenance liability as soon as it is written ○ Writing infrastructure software not one of our goals ● Maintaining in-memory database for the volume we wanted was not cost-effective (mainly due to AWS memory costs) ● Couldn't scale dynamically: full DB cluster restart required
  • 17. The Solution ● Migrate our internal event-stream-based workers to Storm ○ Whole community to check and maintain code ● Move to HBase for long-term API datastore ○ Keep data for longer - better trend decisions ● VoltDB joins → in-memory joins & HBase ○ small in-memory join window, then flushed ○ full 15-minute join achieved by reading from HBase ○ Trident solves this now - wasn't around then
  • 19. How long (Storm migration)? ● 17 Sept 2011: Released ● 21 Sept 2011: Test cluster processing ● 29 Sept 2011: Substantial implementation of core workers ● 30 Sept 2011: Python workers running under Storm control ● Total engineers: 1
  • 20. The Results (redacted) ● Classifications available within 15 minutes ● Dashboard provides overview of 'legitimate' vs other traffic ● Better data on which to make business decisions
  • 21. Lessons ● Storm is easy to install & run ● First iteration: use Storm for control and scaling of existing queue+worker systems ● Second iteration: use Storm to provide redundancy via acking/replays ● Third iteration: remove intermediate queues to realise performance benefits
  • 22. A Quick Aside on DRPC ● Our initial API implementation in HBase was slow ● Large number of partial aggregates to consume, all handled by a single process ● Storm's DRPC provided a 10x speedup - machines across the cluster pulled partials from HBase, generated 'mega-partials'; final step as a reducer => final totals.
  • 23. Storm solved a scaling problem impressions/ month 60 million 60 millionsignals/month 60 million 240 million 1.5 billion 6 billion Pre-history (Summer 2010) storm released present day RabbitMQ queues Python workers Hypertable: API datastore Hadoop batch analysis/ RabbitMQ queues Python workers VoltDB: API datastore + real-time joins No batch analysis (this setup was pretty reliable!) Custom cluster management + worker scaling RabbitMQ queues Storm Topologies: in-memory joins HBase: API datastore Cascading for post- failure restores (too much data to do without!) billions and billions and billions billions and billions and billions Logging via CDN Cascading for data analysis Hive for aggregations High-level aggregates in MySQL 1 2 3
  • 24. Key moments ● Enabled across substantial internet ad inventory ○ 10x increase in impression volume ○ Low-latency, always-up requirements ○ Competitive marketplace ● Exit Storm. 2
  • 25. What happened? ● Stopped upgrading at 0.6.2 ○ Big customers unable to use real-time data at time ○ An unnecessary cost ○ Batch options provided better resiliency and cost profile ● Too expensive to provide very low-latency data collection in a compatible way ● Legacy systems continue to run...
  • 26. How reliable? ● Legacy topologies still running:
  • 27. CDN RabbitMQ Cluster (Scalable) Storm cluster (Scalable) Server- and client-side signals HBase (Scalable) Cascading on Amazon Elastic MapReduce From logs LEGACY Hive on EMR (Aggregate Generation) Bulk Export
  • 28. The Results ● Identification of a botnet cluster attracting international press ● Many other sources of fraud under active investigation ● Using Amazon EC2 spot instances for batch analysis when cheapest - not paying for always-up
  • 30. Lessons ● Benefit of real-time processing is a business decision - batch may be more cost effective ● Storm is easy and reliable to use, but you need supporting infrastructure around it (e.g. queue servers) ● It may be the supporting infrastructure that gives you problems...
  • 31. Storm solved a scaling problem impressions/ month 60 million 60 millionsignals/month 60 million 240 million 1.5 billion 6 billion Pre-history (Summer 2010) storm released present day RabbitMQ queues Python workers Hypertable: API datastore Hadoop batch analysis/ RabbitMQ queues Python workers VoltDB: API datastore + real-time joins No batch analysis (this setup was pretty reliable!) Custom cluster management + worker scaling RabbitMQ queues Storm Topologies: in-memory joins HBase: API datastore Cascading for post- failure restores (too much data to do without!) billions and billions and billions billions and billions and billions Logging via CDN Cascading for data analysis Hive for aggregations High-level aggregates in MySQL 1 2 3
  • 32. Key moments ● Arms race begins ○ Fraudsters in control of large botnets able to respond quickly ○ Source and signatures of fraud will change faster and faster in the future, as we close off more avenues ○ Growing demand for more immediate classifications than provided by batch-only ● Welcome Back Storm. 3
  • 33. What now? ● Returning to Storm, paired with Mahout ● Crunching billions and billions of impressions using Cascading + Mahout ● Real-time response using Trident + Mahout ○ Known-bad signatures identify new botnet IPs, suspect publishers ○ Online learning adapts models to emerging threats
  • 34. Lessons ● As your business changes, your architecture must change ● Choose Storm if: ○ you have existing ad-hoc event streaming systems that could use more resiliency ○ your business needs a new real-time analysis component that fits an event-streaming model ○ you're happy to run appropriate infrastructure around it ● Don't choose Storm if: ○ you have no use for real-time data ○ you only want to use it because it's cool
  • 35. More Lessons ● Using Cascading for Hadoop jobs and Storm for real-time is REALLY handy ○ Retains event-streaming paradigm ○ No need to completely re-think implementation when switching between them ○ In some circumstances can share code ○ We have a library which provides common analysis components for both implementations ● A reasonably managed Storm cluster will stay up for ages.