SlideShare a Scribd company logo
1 of 64
Where Django Caching
Busts at the Seams
DjangoCon US 2012
concentricsky.com
@concentricsky // wiggins / rimkus / biglan
Introductions




 Mike Biglan       Wiggins       Kyle Rimkus
      CTO           Architect    Senior Developer

 @concentricsky   @coffindragger     @kylerimkus
Areas of Concentric Sky




  Web/Django      Mobile   Enterprise
(Some) Technologies We Use

 • Chef           • Backbone.js
 • Jenkins        • Jinja2
 • AWS            • Celery
 • Sphinx         • DebugToolbar
 • Memcached      • jQuery / jQueryUI
 • South          • Compass / Sass
 • git            • Mongo & MySQL
Released CSky Creations



 • djenesis – Easy Django Bootstrapping
 • breakdown - Lightweight jinja2 template
   prototyping server

 • django-cachemodel – Automatic caching
   for django models
CSky Teams


             10   Django Team

   STAFF

50+
             5   Design Team

             9   PM Team

             3   System Admins
Cached Joke



 Two hard things in computer science:
  1. cache invalidation

  2. naming things,

  3. and off-by-one errors
Journey of Optimization




 • Like Life
  • Focus on the journey, not the destination
 • No Premature Optimization
Continual Process of Optimization


 • In a response to a changing environment
  • (hopefully huge traffic)
  •   (hopefully not DDOS)

 • Subject to constraints of time
  • With infinite time, you’d build it perfect initially
 • Ideally you can do small tricks up front to prepare
Cycle of Optimization



 1. Measure - Traffic, latency, bottlenecks, profile components

 2. Plan

 3. Implement

 4. Measure – So, did it work?
Journey of Caching



 "There is no one right way to do it...the best
 approach is very application dependent; one-size
 does not fit all."
        Yann Malet, Lincoln Loop
Web Caching Defined




 "Storing expensive data for more
 immediate, future retrieval"
       Modified from Noah Silas, C.R.E.A.M. Talk PyCon 2012
Goals for Caching



 • Faster response times
 • Scalability
 • Cache unreliable external services
  • Assume all are unreliable...
Beware, Caching Can Lead to


 • More points of failure. Especially if relying on
   cache.

 • Complexity & Invalidation hell
 • Thundering herd & Warming Solutions
 • Elegant, Scalable, Performant Applications
Potential Cache Elements




 • Focus on each potential data cache
  • E.g. key-value variable pair
Types of Caching Strategies

• What are the data to cache?
• Where will it be stored?
   •   Memoize in python (e.g. via decorator)

   •   Backend server (e.g. memcached or redis)

• Where will it be rendered?
   •   In-app (query, view, template)

   •   Outside app: client-side, edge-side (a.k.a. upstream)

• Do updates happen inside request-repsonse cycle or
  outside? Or both?
Madlib of Cache


 I want to cache _____ data so that it stores it _____
 and when needed returns it into the [template/view/
 query] via a request from [in-app/AJAX-call].

 Updates to this occur [in/outside/both] the request-
 response cycle and when they occur it [triggers a
 update-job/invalidates it and related].
How Important to Cache




• If not in a cache, how problematic
  • That is, would you be reliant on caching
Properties of Cached Data

 1. Sensitivity to it returning old data

 2. Cost of (re)creating cache element (e.g. time, RAM)

 3. Expense/Size of storing cache result

 4. How often (likely to be) used?

 5. How often (likely to) change?

 6. Additional complexity caching adds
Cached Data – Lists/Arrays



 7. N-dimensional array of data (e.g. list of objects)

  • Constant order/filter (e.g. client-side use)
  • Changing order/filter, beware of caching
    •   Denormalize via external tool or data structure. E.g. in-
        database, sphinx, datacube
Cached Data – Relationality


 8. Relationality of these data with others

  • The more dependence on data, the more complexity
     related to invalidation

  • Graph of dependencies (specifically a DAG)
  • Computed result and what input data it depends on
    •   Joins are one example
Caching, the simple solution




 • Who is using Memcache?
 • Django’s basic per-site caching?
 • Noah Silas: “Cache Rules Everything Around Me?”
Cash Rules Everything Around Me
Cash Rules Everything Around Me
Caching, the simple solution




 • Caching ?= Memcache
Caching, the simple solution




Fake example website concept for purposes of caching talk
Caching, the simple solution




 • So you’re saying we have a site.
Caching, the simple solution

 • Per-site Caching
Caching, the simple solution
Caching inline


 • Expensive queries
context[‘items’] = HugeListOfItems.objects.all()



 • Wrap the lookup
def get_cached_items():
    items = cache.get("my_cached_items")
    if items == None:
      items = HugeListOfItems.objects.all()
      cache.set("my_cached_items", items, 900)
    return items
Caching inline




            PROS     CONS

             quick   dirty
Thundering Herd




• A cache miss means many requests will trigger
  database reads until the cache can be filled.
Thundering Herd

 Solutions?
• Do not let values expire from the cache
• When an object is stale, refresh the value in the
  background and continue to provide the cached
  value to the rest of the herd until the value can be
  updated
Large data sets don't cache well


 Problems?
 • Requests for rarely used data
  • Address searching on a map
  • Mail clients
Gotchas when deploying caching



• Versioning your caches.
• Separating your caches based on use.
• Using consistent hashing algorithms.
• Deploying memcached with Elasticache.
Version your cache
CACHES = {
    'default': {
        'BACKEND':
‘django.core.cache.backends.memcached.Mecachedcache’,
        'LOCATION': '127.0.0.1:11211',
        ‘VERSION’: 1,
    },
}

 • You need to push a code change that alters the
   way a cache value is generated.

 • You need to preserve two different copies of the
   cache based on the version

 • Consider using the git sha1 hash.
   $ git rev-parse HEAD | cut -b 1-7
Sessions in the cache



 • Django documentation urges using the
   memcached session backend.

 • But if you need to bump your cache, you will log
   out all your users.

 • So Django 1.3 added multiple named caches...
Easier said than done...
memcached_backend =
‘django.core.cache.backends.memcached.Memcachedcache’
CACHES = {
    'default': {
        'BACKEND': memcached_backend,
        'LOCATION': '127.0.0.1:11211',
        ‘KEY_PREFIX’: ‘default’,
        ‘VERSION’: 1,
    },
    'session': {
        'BACKEND': memcached_backend,
        'LOCATION': '127.0.0.1:11211',
        ‘KEY_PREFIX’: ‘session’,
        ‘VERSION’: 1,
    }
}
Named caches are great, but...



 • Django’s memcached session backend does not
   have a way to choose a particular named cache...

 • So, you’d have to write your own SessionBackend.
 • OR, specify a different cache for everything else.
Consistent Hashing


server_idx = hash(key) % serverlist.length;
server = serverlist[server_idx]




• If serverlist.length changes, all keys get expired
• That means that if a node goes down or you need to add
  a new node, you will invalidate most of the keys in your
  cache.
The Ketama Algorithm
http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients



     •   Take your list of servers (eg: 1.2.3.4:11211, 5.6.7.8:11211, 9.8.7.6:11211)

     •   Hash each server string to several (100-200) unsigned ints

     •   Conceptually, these numbers are placed on a circle called the
         continuum. (imagine a clock face that goes from 0 to 2^32)

     •   Each number links to the server it was hashed from, so servers appear
         at several points on the continuum, by each of the numbers they
         hashed to.

     •   To map a key->server, hash your key to a single unsigned int, and find
         the next biggest number on the continuum. The server linked to that
         number is the correct server for that key.


                                           TL;DR
But who cares...

             • Just use pylibmc and django-pylibmc
               https://github.com/jbalogh/django-pylibmc

# pip install django-pylibmc

CACHES = {
    'default': {
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
        'LOCATION': 'localhost:11211',
        ‘TIMEOUT’: 0,
        'OPTIONS': {
            'ketama': True
        }
    }
}

              • Note that timeout=0 now caches forever
Elasticache



 • Elasticache is memcache as an AWS service.
 • It just works.
 • It does cost more than doing it yourself...
 • but deployment is hard, so think about it.
Elasticache is quick and easy!
Well... until you get here...
Elasticache Security Groups

 • EC2 Security Groups define incoming firewall rules.
 • You should create a EC2 security group for each project.
 • Add a EC2 security group rule to allow access to the
   memcache port

 • Then create an Elasticache Security Group for the
   project.

 • Authorize the EC2 Security Group on the Elasticache
   Security Group
Optimizing your cache




 • What should be cached?
 • Cache Warming
 • Cache Frameworks
How do I know what I should cache?

 • Use debug-toolbar to reduce query counts and
   monitor cache hits/misses.
   https://github.com/django-debug-toolbar/django-debug-toolbar


 • Use django-memcached-monitor to monitor
   memcache stats in the admin.
   https://github.com/bartTC/django-memcache-status


 • NewRelic is a great tool for profiling and
   monitoring.
   http://newrelic.com
Cache Warming
Cache Warming




 • What? Asynchronous cache filling
 • Why? Allows web requests to remain fast
 • Don’t make users do the work for you
Cache Warming




• manage.py command
• Use celery or cron to trigger
Caching Frameworks


• Johnny-Cache
• Cache Machine
• CacheModel
• Auto Cache
• djangopackages.com
Caching Frameworks




• Cache Machine
• http://jbalogh.me/projects/cache-machine/
Caching Frameworks




• Johnny Cache
• http://readthedocs.org/projects/johnny-cache/
Caching Frameworks




• Cache Model
• https://github.com/concentricsky/django-
  cachemodel
Caching Frameworks




• Auto Cache
• https://github.com/noah256/django-autocache
 MyModel.cache.get() vs. MyModel.objects.get()
The Publish Model



• See “Cache Rules Everything Around Me”
  --Noah Silas
  http://pyvideo.org/video/679


• Also see “Django Doesnt Scale”
  --Jacob Kaplan Moss
  https://speakerdeck.com/u/jacobian/p/django-doesnt-scale
C.R.E.A.M. get the money.

 • Cache everything “forever”.
   One month is basically forever right?

 • If you hit the cache, and the data is stale,
   immediately return it to the user anyway, and
   update it in the background.

 • Waiting for results appears broken.
 • Stale results gives you perceived performance.
Views should never block on the db...




Shamelessly ripped from “Django Doesnt Scale” by Jacob Kaplan-Moss
KK. But How?!


 • Build a cache key file.
 • Associate cache keys with functions that publish to
   the cache.

 • Use celery tasks to trigger cache warming.
 • Pre-warm caches.
Yay! Code! Now you can tune out...
               croupon/cachekeys.py
 import croupon.tasks
 cachekeys = {
     'PopularInCity': (
         'croupons:popular_in_city:%(city)s',
         croupon.tasks.top5_croupons_in_city
     ),
     ‘SomeOtherStuff’: (
         ‘croupons:popular’,
         croupon.tasks.top_croupons
     )
 }

                  croupon/tasks.py
@celery.task
def top5_croupons_in_city(city):
    return
Croupon.objects.filter(city=city).order_by('popularit
y')[:5]
That’s right, another slide of code!
                croupon/models.py
 import croupon.cachekeys

 class Croupon(models.Model):
     ...
     def save(self, *args, **kwargs):
         super(Croupon, self).save(*args, **kwargs)
         # something has changed
         # so update the cache in the background

 publish_to_cache.apply_async(croupon.cachekeys,
 'PopularInCity', city=self.city)


                 croupon/views.py
 context['popular_croupons'] =
 consume_from_cache(croupon.cachekeys,
 ‘PopularInCity’, city=city)
 ...
More code? Really?!
               publish_model/utils.py
 def publish_to_cache(keydict, key_name, **kwargs):
     key_fmt, data_fun = keydict[key_name]
     key = key_fmt % kwargs
     data = data_fun(**kwargs)
     cache.set(key, data, FOREVER_TIMEOUT)
     return data

 def consume_from_cache(keydict, key_name, **kwargs):
     key_fmt, data_fun = keydict[key_name]
     key = key_fmt % kwargs
     data = cache.get(key)
     if data is None:
         # there was a cache miss,
         # so fall back to thundering herd
         data = publish_to_cache.apply(keydict,
 key_name, **kwargs)
     return data
Where Django Caching
Busts at the Seams
DjangoCon US 2012
concentricsky.com
@concentricsky // wiggins / rimkus / biglan

More Related Content

What's hot

Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSamantha Quiñones
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Tips for going fast in a slow world: Michael May at OSCON 2015
Tips for going fast in a slow world: Michael May at OSCON 2015Tips for going fast in a slow world: Michael May at OSCON 2015
Tips for going fast in a slow world: Michael May at OSCON 2015Fastly
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and securityBen Bromhead
 
Memcached Code Camp 2009
Memcached Code Camp 2009Memcached Code Camp 2009
Memcached Code Camp 2009NorthScale
 
ITB2017 - Nginx Effective High Availability Content Caching
ITB2017 - Nginx Effective High Availability Content CachingITB2017 - Nginx Effective High Availability Content Caching
ITB2017 - Nginx Effective High Availability Content CachingOrtus Solutions, Corp
 
Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deploymentzeeg
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
Production Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldProduction Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldSean Chittenden
 
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax EnterpriseCassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax EnterpriseDataStax Academy
 
Dynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningSean Chittenden
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018Carmen Mason
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Viet-Dung TRINH
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Fastly
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsOhad Kravchick
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Altitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and ClusteringAltitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and ClusteringFastly
 
NGINX: The Past, Present and Future of the Modern Web
NGINX: The Past, Present and Future of the Modern WebNGINX: The Past, Present and Future of the Modern Web
NGINX: The Past, Present and Future of the Modern WebKevin Jones
 
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Jon Watte
 

What's hot (20)

Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with Varnish
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Tips for going fast in a slow world: Michael May at OSCON 2015
Tips for going fast in a slow world: Michael May at OSCON 2015Tips for going fast in a slow world: Michael May at OSCON 2015
Tips for going fast in a slow world: Michael May at OSCON 2015
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and security
 
Memcached Code Camp 2009
Memcached Code Camp 2009Memcached Code Camp 2009
Memcached Code Camp 2009
 
ITB2017 - Nginx Effective High Availability Content Caching
ITB2017 - Nginx Effective High Availability Content CachingITB2017 - Nginx Effective High Availability Content Caching
ITB2017 - Nginx Effective High Availability Content Caching
 
Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Production Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldProduction Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated World
 
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax EnterpriseCassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
 
Dynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency Planning
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Altitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and ClusteringAltitude SF 2017: Advanced VCL: Shielding and Clustering
Altitude SF 2017: Advanced VCL: Shielding and Clustering
 
NGINX: The Past, Present and Future of the Modern Web
NGINX: The Past, Present and Future of the Modern WebNGINX: The Past, Present and Future of the Modern Web
NGINX: The Past, Present and Future of the Modern Web
 
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
 

Similar to Where Django Caching Bust at the Seams

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Caching your rails application
Caching your rails applicationCaching your rails application
Caching your rails applicationArrrrCamp
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Rails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeRails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeFastly
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeMichael May
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it FastBarry Jones
 
Accelerating Rails with edge caching
Accelerating Rails with edge cachingAccelerating Rails with edge caching
Accelerating Rails with edge cachingMichael May
 
Memcached: What is it and what does it do?
Memcached: What is it and what does it do?Memcached: What is it and what does it do?
Memcached: What is it and what does it do?Brian Moon
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupalRonan Berder
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 

Similar to Where Django Caching Bust at the Seams (20)

Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Caching your rails application
Caching your rails applicationCaching your rails application
Caching your rails application
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Rails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeRails Caching: Secrets From the Edge
Rails Caching: Secrets From the Edge
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the Edge
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it Fast
 
Accelerating Rails with edge caching
Accelerating Rails with edge cachingAccelerating Rails with edge caching
Accelerating Rails with edge caching
 
Memcached: What is it and what does it do?
Memcached: What is it and what does it do?Memcached: What is it and what does it do?
Memcached: What is it and what does it do?
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupal
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
Memcached
MemcachedMemcached
Memcached
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Where Django Caching Bust at the Seams

  • 1. Where Django Caching Busts at the Seams DjangoCon US 2012 concentricsky.com @concentricsky // wiggins / rimkus / biglan
  • 2. Introductions Mike Biglan Wiggins Kyle Rimkus CTO Architect Senior Developer @concentricsky @coffindragger @kylerimkus
  • 3. Areas of Concentric Sky Web/Django Mobile Enterprise
  • 4. (Some) Technologies We Use • Chef • Backbone.js • Jenkins • Jinja2 • AWS • Celery • Sphinx • DebugToolbar • Memcached • jQuery / jQueryUI • South • Compass / Sass • git • Mongo & MySQL
  • 5. Released CSky Creations • djenesis – Easy Django Bootstrapping • breakdown - Lightweight jinja2 template prototyping server • django-cachemodel – Automatic caching for django models
  • 6. CSky Teams 10 Django Team STAFF 50+ 5 Design Team 9 PM Team 3 System Admins
  • 7. Cached Joke Two hard things in computer science: 1. cache invalidation 2. naming things, 3. and off-by-one errors
  • 8. Journey of Optimization • Like Life • Focus on the journey, not the destination • No Premature Optimization
  • 9. Continual Process of Optimization • In a response to a changing environment • (hopefully huge traffic) • (hopefully not DDOS) • Subject to constraints of time • With infinite time, you’d build it perfect initially • Ideally you can do small tricks up front to prepare
  • 10. Cycle of Optimization 1. Measure - Traffic, latency, bottlenecks, profile components 2. Plan 3. Implement 4. Measure – So, did it work?
  • 11. Journey of Caching "There is no one right way to do it...the best approach is very application dependent; one-size does not fit all." Yann Malet, Lincoln Loop
  • 12. Web Caching Defined "Storing expensive data for more immediate, future retrieval" Modified from Noah Silas, C.R.E.A.M. Talk PyCon 2012
  • 13. Goals for Caching • Faster response times • Scalability • Cache unreliable external services • Assume all are unreliable...
  • 14. Beware, Caching Can Lead to • More points of failure. Especially if relying on cache. • Complexity & Invalidation hell • Thundering herd & Warming Solutions • Elegant, Scalable, Performant Applications
  • 15. Potential Cache Elements • Focus on each potential data cache • E.g. key-value variable pair
  • 16. Types of Caching Strategies • What are the data to cache? • Where will it be stored? • Memoize in python (e.g. via decorator) • Backend server (e.g. memcached or redis) • Where will it be rendered? • In-app (query, view, template) • Outside app: client-side, edge-side (a.k.a. upstream) • Do updates happen inside request-repsonse cycle or outside? Or both?
  • 17. Madlib of Cache I want to cache _____ data so that it stores it _____ and when needed returns it into the [template/view/ query] via a request from [in-app/AJAX-call]. Updates to this occur [in/outside/both] the request- response cycle and when they occur it [triggers a update-job/invalidates it and related].
  • 18. How Important to Cache • If not in a cache, how problematic • That is, would you be reliant on caching
  • 19. Properties of Cached Data 1. Sensitivity to it returning old data 2. Cost of (re)creating cache element (e.g. time, RAM) 3. Expense/Size of storing cache result 4. How often (likely to be) used? 5. How often (likely to) change? 6. Additional complexity caching adds
  • 20. Cached Data – Lists/Arrays 7. N-dimensional array of data (e.g. list of objects) • Constant order/filter (e.g. client-side use) • Changing order/filter, beware of caching • Denormalize via external tool or data structure. E.g. in- database, sphinx, datacube
  • 21. Cached Data – Relationality 8. Relationality of these data with others • The more dependence on data, the more complexity related to invalidation • Graph of dependencies (specifically a DAG) • Computed result and what input data it depends on • Joins are one example
  • 22. Caching, the simple solution • Who is using Memcache? • Django’s basic per-site caching? • Noah Silas: “Cache Rules Everything Around Me?”
  • 25. Caching, the simple solution • Caching ?= Memcache
  • 26. Caching, the simple solution Fake example website concept for purposes of caching talk
  • 27. Caching, the simple solution • So you’re saying we have a site.
  • 28. Caching, the simple solution • Per-site Caching
  • 30. Caching inline • Expensive queries context[‘items’] = HugeListOfItems.objects.all() • Wrap the lookup def get_cached_items(): items = cache.get("my_cached_items") if items == None: items = HugeListOfItems.objects.all() cache.set("my_cached_items", items, 900) return items
  • 31. Caching inline PROS CONS quick dirty
  • 32. Thundering Herd • A cache miss means many requests will trigger database reads until the cache can be filled.
  • 33. Thundering Herd Solutions? • Do not let values expire from the cache • When an object is stale, refresh the value in the background and continue to provide the cached value to the rest of the herd until the value can be updated
  • 34. Large data sets don't cache well Problems? • Requests for rarely used data • Address searching on a map • Mail clients
  • 35. Gotchas when deploying caching • Versioning your caches. • Separating your caches based on use. • Using consistent hashing algorithms. • Deploying memcached with Elasticache.
  • 36. Version your cache CACHES = { 'default': { 'BACKEND': ‘django.core.cache.backends.memcached.Mecachedcache’, 'LOCATION': '127.0.0.1:11211', ‘VERSION’: 1, }, } • You need to push a code change that alters the way a cache value is generated. • You need to preserve two different copies of the cache based on the version • Consider using the git sha1 hash. $ git rev-parse HEAD | cut -b 1-7
  • 37. Sessions in the cache • Django documentation urges using the memcached session backend. • But if you need to bump your cache, you will log out all your users. • So Django 1.3 added multiple named caches...
  • 38. Easier said than done... memcached_backend = ‘django.core.cache.backends.memcached.Memcachedcache’ CACHES = { 'default': { 'BACKEND': memcached_backend, 'LOCATION': '127.0.0.1:11211', ‘KEY_PREFIX’: ‘default’, ‘VERSION’: 1, }, 'session': { 'BACKEND': memcached_backend, 'LOCATION': '127.0.0.1:11211', ‘KEY_PREFIX’: ‘session’, ‘VERSION’: 1, } }
  • 39. Named caches are great, but... • Django’s memcached session backend does not have a way to choose a particular named cache... • So, you’d have to write your own SessionBackend. • OR, specify a different cache for everything else.
  • 40. Consistent Hashing server_idx = hash(key) % serverlist.length; server = serverlist[server_idx] • If serverlist.length changes, all keys get expired • That means that if a node goes down or you need to add a new node, you will invalidate most of the keys in your cache.
  • 41. The Ketama Algorithm http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients • Take your list of servers (eg: 1.2.3.4:11211, 5.6.7.8:11211, 9.8.7.6:11211) • Hash each server string to several (100-200) unsigned ints • Conceptually, these numbers are placed on a circle called the continuum. (imagine a clock face that goes from 0 to 2^32) • Each number links to the server it was hashed from, so servers appear at several points on the continuum, by each of the numbers they hashed to. • To map a key->server, hash your key to a single unsigned int, and find the next biggest number on the continuum. The server linked to that number is the correct server for that key. TL;DR
  • 42. But who cares... • Just use pylibmc and django-pylibmc https://github.com/jbalogh/django-pylibmc # pip install django-pylibmc CACHES = { 'default': { 'BACKEND': 'django_pylibmc.memcached.PyLibMCCache', 'LOCATION': 'localhost:11211', ‘TIMEOUT’: 0, 'OPTIONS': { 'ketama': True } } } • Note that timeout=0 now caches forever
  • 43. Elasticache • Elasticache is memcache as an AWS service. • It just works. • It does cost more than doing it yourself... • but deployment is hard, so think about it.
  • 44. Elasticache is quick and easy!
  • 45. Well... until you get here...
  • 46. Elasticache Security Groups • EC2 Security Groups define incoming firewall rules. • You should create a EC2 security group for each project. • Add a EC2 security group rule to allow access to the memcache port • Then create an Elasticache Security Group for the project. • Authorize the EC2 Security Group on the Elasticache Security Group
  • 47. Optimizing your cache • What should be cached? • Cache Warming • Cache Frameworks
  • 48. How do I know what I should cache? • Use debug-toolbar to reduce query counts and monitor cache hits/misses. https://github.com/django-debug-toolbar/django-debug-toolbar • Use django-memcached-monitor to monitor memcache stats in the admin. https://github.com/bartTC/django-memcache-status • NewRelic is a great tool for profiling and monitoring. http://newrelic.com
  • 50. Cache Warming • What? Asynchronous cache filling • Why? Allows web requests to remain fast • Don’t make users do the work for you
  • 51. Cache Warming • manage.py command • Use celery or cron to trigger
  • 52. Caching Frameworks • Johnny-Cache • Cache Machine • CacheModel • Auto Cache • djangopackages.com
  • 53. Caching Frameworks • Cache Machine • http://jbalogh.me/projects/cache-machine/
  • 54. Caching Frameworks • Johnny Cache • http://readthedocs.org/projects/johnny-cache/
  • 55. Caching Frameworks • Cache Model • https://github.com/concentricsky/django- cachemodel
  • 56. Caching Frameworks • Auto Cache • https://github.com/noah256/django-autocache MyModel.cache.get() vs. MyModel.objects.get()
  • 57. The Publish Model • See “Cache Rules Everything Around Me” --Noah Silas http://pyvideo.org/video/679 • Also see “Django Doesnt Scale” --Jacob Kaplan Moss https://speakerdeck.com/u/jacobian/p/django-doesnt-scale
  • 58. C.R.E.A.M. get the money. • Cache everything “forever”. One month is basically forever right? • If you hit the cache, and the data is stale, immediately return it to the user anyway, and update it in the background. • Waiting for results appears broken. • Stale results gives you perceived performance.
  • 59. Views should never block on the db... Shamelessly ripped from “Django Doesnt Scale” by Jacob Kaplan-Moss
  • 60. KK. But How?! • Build a cache key file. • Associate cache keys with functions that publish to the cache. • Use celery tasks to trigger cache warming. • Pre-warm caches.
  • 61. Yay! Code! Now you can tune out... croupon/cachekeys.py import croupon.tasks cachekeys = { 'PopularInCity': ( 'croupons:popular_in_city:%(city)s', croupon.tasks.top5_croupons_in_city ), ‘SomeOtherStuff’: ( ‘croupons:popular’, croupon.tasks.top_croupons ) } croupon/tasks.py @celery.task def top5_croupons_in_city(city): return Croupon.objects.filter(city=city).order_by('popularit y')[:5]
  • 62. That’s right, another slide of code! croupon/models.py import croupon.cachekeys class Croupon(models.Model): ... def save(self, *args, **kwargs): super(Croupon, self).save(*args, **kwargs) # something has changed # so update the cache in the background publish_to_cache.apply_async(croupon.cachekeys, 'PopularInCity', city=self.city) croupon/views.py context['popular_croupons'] = consume_from_cache(croupon.cachekeys, ‘PopularInCity’, city=city) ...
  • 63. More code? Really?! publish_model/utils.py def publish_to_cache(keydict, key_name, **kwargs): key_fmt, data_fun = keydict[key_name] key = key_fmt % kwargs data = data_fun(**kwargs) cache.set(key, data, FOREVER_TIMEOUT) return data def consume_from_cache(keydict, key_name, **kwargs): key_fmt, data_fun = keydict[key_name] key = key_fmt % kwargs data = cache.get(key) if data is None: # there was a cache miss, # so fall back to thundering herd data = publish_to_cache.apply(keydict, key_name, **kwargs) return data
  • 64. Where Django Caching Busts at the Seams DjangoCon US 2012 concentricsky.com @concentricsky // wiggins / rimkus / biglan

Editor's Notes

  1. try to focus on middle size applications. strategies, tools, implementation details, warnings. and ways of thinking about the problem space\n\n- you've written a mid-sized django app\n- maybe you already have some caching.\n\n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. apologies, promised\nstale cache from previous talks\n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. planning and implementing is what we focus on\n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. * what -- more on this\n\n* where rendered, could be multiple spots\n\n* Updates outside: Asynchronous: e.g. cron, sphinx. \nAsynchronous may be triggered by request-response cycle\n\n** can mix and match. beware dependencies. even of same type. two-phased template rendering\n- two phase template renderings\n\n
  32. * what -- more on this\n\n* where rendered, could be multiple spots\n\n* Updates outside: Asynchronous: e.g. cron, sphinx. \nAsynchronous may be triggered by request-response cycle\n\n** can mix and match. beware dependencies. even of same type. two-phased template rendering\n- two phase template renderings\n\n
  33. * what -- more on this\n\n* where rendered, could be multiple spots\n\n* Updates outside: Asynchronous: e.g. cron, sphinx. \nAsynchronous may be triggered by request-response cycle\n\n** can mix and match. beware dependencies. even of same type. two-phased template rendering\n- two phase template renderings\n\n
  34. * what -- more on this\n\n* where rendered, could be multiple spots\n\n* Updates outside: Asynchronous: e.g. cron, sphinx. \nAsynchronous may be triggered by request-response cycle\n\n** can mix and match. beware dependencies. even of same type. two-phased template rendering\n- two phase template renderings\n\n
  35. next up strategies for this by wiggins/kyle\n
  36. \n
  37. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  38. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  39. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  40. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  41. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  42. twitter sidebar asynch update every 5 minutes. fine.\nuser does x. x replaces y. user asks for x and gets y. problem.\n\nIf changes constantly, caching no value\n
  43. \n
  44. !!! “string theory” - helpful to imagine reaching into a db (or persistent layer) with strings attached\n\n!!! reach into database with a join. cache in query. \n\nhigher up you cache, likely the more \n\nstrategy next\n
  45. - Who in the crowd is currently using memcache in their Django applications?\n- What about Django's basic per-site caching?\n- Has anyone seen Noah Silas talk about Cache Rules Everything Around Me?\n- in that talk, he briefly mentions the source of his title, but doesn’t want to talk about it\n\n>> We ARE going to talk about Wu Tang Clan\n\n
  46. - but only for this slide\n- they had the hit Cash Rules Everything Around Me\n\n
  47. - If you want to discuss more, see me after the talk\n\n
  48. - Back to Django\n\n- So does caching in Django just mean using memcache?\n\n- Well, it is commonly used and easy to set up\n- A lot of cache frameworks have documentation along the lines of "... developed specifically with the use of memcached in mind. "\n\nNext:\n- We'd like to talk about caching in the context of an imaginary website. \n- We will walk through the growth of this site and what issues we'd hit along the way\n\n\n
  49. - As many of you may have experienced, we have a great idea, but we don't have the time or money to build a huge, scaling site from the very start\n- We shouldn't completely ignore scalability, but with limited resources, a site that launches is better than one that is prematurely optimized.\n- Also, marketing or management might have constraints on the project outside of development's control\n- We aren't here to wag our finger at you and tell you how you should have done things all along, we're here to walk through some real issues and present solutions\n
  50. - So we have our site live\n- Although there isn't much data on it\n- And not many users\n- Perhaps a significant amount of the site traffic is from the development team testing the site.\n- it happens\n- Perhaps all of the actual business is from family and friends of the team.\n- But now that the site is live, the dev team has time to focus on the next phase of features.\n- Let's talk about ways that we can improve the site, hopefully to help prepare it for growth\n
  51. \n- Of course, some of the primary concerns of the Django developers on the team are the responsiveness of our application, and the load it creates on the web server.\n\n- Lucky for us developers that like to fly by the seat of our pants, Django provides some extremely quick ways of adding basic per-site caching to every page. This basically means that if a page has been requested within the cache timeout, it will be served to any other users from the cache instead of the database\n\n- This is close to upstream caching like Varnish, but it’s still aware of things like the request user’s authentication status.\n\n- Hopefully everyone here is already past this level, but we will talk more about deploying memcache later.\n
  52. \n- Of course, some of the primary concerns of the Django developers on the team are the responsiveness of our application, and the load it creates on the web server.\n\n- Lucky for us developers that like to fly by the seat of our pants, Django provides some extremely quick ways of adding basic per-site caching to every page. This basically means that if a page has been requested within the cache timeout, it will be served to any other users from the cache instead of the database\n\n- This is close to upstream caching like Varnish, but it’s still aware of things like the request user’s authentication status.\n\n- Hopefully everyone here is already past this level, but we will talk more about deploying memcache later.\n
  53. \n- So then it happens. Your management lands a deal with a rival company and suddenly you have way more data than you anticipated.\n\n- The per-site caching helps with a lot of the site, but there are still some expensive queries happening, and probably on the index page\n
  54. So what is a solution?\n- perhaps there are only a few expensive queries to make.\n- Just wrap a cache read/write around them, and they will be cached for subsequent\n\n- don’t name your cache key “my_cached_items”\n\n- this will result in keys generated as-needed, sprinkled througout the code\n\n
  55. So what is a solution?\n- perhaps there are only a few expensive queries to make.\n- Just wrap a cache read/write around them, and they will be cached for subsequent\n\n- don’t name your cache key “my_cached_items”\n\n- this will result in keys generated as-needed, sprinkled througout the code\n\n
  56. So what is a solution?\n- perhaps there are only a few expensive queries to make.\n- Just wrap a cache read/write around them, and they will be cached for subsequent\n\n- don’t name your cache key “my_cached_items”\n\n- this will result in keys generated as-needed, sprinkled througout the code\n\n
  57. - Pros: quick\n- Cons: dirty\n
  58. - One of the first issues this site is going to run into is what happens when there is a cache miss?\n\n- Pretending that our new expanded dataset is actually enticing customers, In the case of heavy traffic, a cache miss means many requests will trigger database reads until the cache can be filled. This stampede of requests is commonly referred to as a ‘Thundering Herd’.\n\n- Even though a small percentage of requests trigger a read, they are occurring at the same time. \n
  59. \n- A solution to Thundering Herd is to not drop values completely from the cache, but to refresh the value when the first miss occurs, or when an object is deemed to be ‘stale’, and continue to provide the cached value to the rest of the herd until the value can be updated. We refer to this as the Publish Model, and we will talk more about it later.\n\n- One issue is that the rest of the herd will be getting the stale value, but it probably is fine for the short time.\n\n- The new value will probably take seconds, if that, to calculate and refresh the cache, but only one read will be sent to the database.\n\n
  60. So even with \n\n- Often, requests are spread out evenly across a large dataset\n\n- Sometimes, users are not requesting anything seen in a long time. The data might be unique to that user\n\n- This doesn’t apply to our croupon site, but it would with Address lookups on a map, mail clients, etc\n\n- If you are running into these sorts of issues, then your site is moving along that spectrum of what is an app and what is a content site. Remember the slide from Adrian’s keynote yesterday on the sweet spot that Django aims to serve. It doesn’t reach all the way to the ‘app’ end of the spectrum\n\n- Luckily Croupon is solidly toward the content end, and is pretty well suited for a lot of the solutions we are talking about\n\nPASS TO WIGGINS to talk about deploying caching\n
  61. Versioning your caches.\nSeparating your caches based on use.\nUsing consistent hashing algorithms.\nDeploying memcached with Elasticache.\n
  62. - What happens when you need to incrementally push a code change that alters the way a cache value is generated?\n- You need a way to have the new code write to a different key so that the old values are preserved\n - staging and production\n- Versioning according to git commit hash is nice, but that means settings needs to know the hash before it is made, fabric is a good solution for this.\n
  63. - What happens when you need to incrementally push a code change that alters the way a cache value is generated?\n- You need a way to have the new code write to a different key so that the old values are preserved\n - staging and production\n- Versioning according to git commit hash is nice, but that means settings needs to know the hash before it is made, fabric is a good solution for this.\n
  64. - What happens when you need to incrementally push a code change that alters the way a cache value is generated?\n- You need a way to have the new code write to a different key so that the old values are preserved\n - staging and production\n- Versioning according to git commit hash is nice, but that means settings needs to know the hash before it is made, fabric is a good solution for this.\n
  65. \nuse a separate cache for sessions\n
  66. \nuse a separate cache for sessions\n
  67. \nuse a separate cache for sessions\n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. MOVING ON TO: elasticache\n
  79. PASS TO KYLE to talk about optimizing cache\n
  80. PASS TO KYLE to talk about optimizing cache\n
  81. PASS TO KYLE to talk about optimizing cache\n
  82. PASS TO KYLE to talk about optimizing cache\n
  83. \n
  84. \n
  85. NOW ONTO PROFILING\n
  86. NOW ONTO PROFILING\n
  87. NOW ONTO PROFILING\n
  88. NOW ONTO PROFILING\n
  89. NOW ONTO PROFILING\n
  90. \n
  91. - I’m going to talk about three topics related to optimizing your cache\n- This is going to be very high-level\n
  92. - I’m going to talk about three topics related to optimizing your cache\n- This is going to be very high-level\n
  93. - I’m going to talk about three topics related to optimizing your cache\n- This is going to be very high-level\n
  94. - Everyone should be using debug toolbar\n
  95. - Everyone should be using debug toolbar\n
  96. - Everyone should be using debug toolbar\n
  97. \n- Croupon marketing decides that we need to have croupon counts peppered around the site to let customers know just how many croupons there are available in their area for each category. \n- Unfortunately, calculating these counts is very expensive. \n- The site may be lightly used, but each index page would generate a lot of joins and filters.\n-very expensive database lookups\n\n- > This is what I see in my head\n\n- The default behavior of cache is to let the process of the first read populate the cache\n-That’s for a static page that takes two seconds to load, but a cached version gets it down to 100 milliseconds\n- But that won’t work in this instance, because you don’t want any users to have to wait for these expensive lookups\n- Especially because these reads are from data that doesn’t change often\n
  98. - so what is cache warming? It is filling the cache in the background, asynchronously with web requests\n\n- The solution is to warm the cache manually, not passively. Give the user what they requested, THEN do the work. Don’t make a user suffer to make your job easier.\n\n- Before the cache expires, refresh the data and reset the expiration date.\n\n- If the data in the cache is a list of items, it is easy to iterate over each item in a queryset.\n\n- For more complicated queries, write a custom manage.py command\n\n- All of this allows the synchronous web requests to remain fast for users\n
  99. \n- manage.py command is simplest\n\n- Run the command just slightly more often than your cache timeout is set for (16 -> 15)\n\n- some caching frameworks want you to register with a warming task\n\n- That can be a quick solution, but it is better to be explicit than implicit. \n- If a different developer down the road wants to see what is being warmed, it is hard to trace back if you aren’t calling a function that explicitly lists what is being done\n
  100. Here are some of the more popular frameworks\n\nHere’s a shameless plug for our own CacheModel\n\nYou can find the entire grid on django packages\n
  101. - You use Cache Machine by inheriting from its CachingMixin and setting its CachingManager as your model’s default manager\n\n- It uses “flush lists” to keep track of the cached queries an object belongs to. Then it will iterate over that list and flush each one when an object is saved or deleted.\n- It also follows foreign keys and flushes them as well, so that’s nice\n\n- Cache Machine also includes a Jinja2 extension to cache template fragments based on the querysets used inside that fragment. That tag will get added to the flush lists, so it will get invalidated just like normal querysets.\n\n\n
  102. - This is a popular framework and is updated fairly often. \n- So, of course, it works with Django 1.4\n\n- "Johnny provides a number of backends, all of which are subclassed versions of django builtins that cache “forever” when passed a 0 timeout."\n\n- Wiggins will talk about how this relates to the publish model\n- The important thing is that the framework gives you the flexibility to use various backends with the same code, much like the Django ORM does with various databases.\n\n\n
  103. - Drop-in caching\n- Your model can just inherit from CacheModel and you get caching on your default manager\n\n
  104. - Written by Noah Silas, who we’ve mentioned\n\n- It provides a controller that you can set on a model, then call that instead of calling your default manager\n\n- myModel.cache.get() instead of myModel.objects.get()\n\n\nPASS TO WIGGINS to talk about the Publish Model\n
  105. The Publish Model - 10min (wiggins)\n - what is the publish model\n - how do you implement it?\n - cachemodel\n\n
  106. \n
  107. \n
  108. - well defined keys in one DRY place\n- define functions that celery can use to publish to the cache\n- you need to pre-warm since the cache will be empty initially.\n
  109. - well defined keys in one DRY place\n- define functions that celery can use to publish to the cache\n- you need to pre-warm since the cache will be empty initially.\n
  110. - well defined keys in one DRY place\n- define functions that celery can use to publish to the cache\n- you need to pre-warm since the cache will be empty initially.\n
  111. - well defined keys in one DRY place\n- define functions that celery can use to publish to the cache\n- you need to pre-warm since the cache will be empty initially.\n
  112. - this is an example of how we would implement the publish model on croupon\n\n- cachekeys is a dictionary of tuples, (key_format_string, publish_function)\n\n- publish functions are decorated as celery tasks so they can be run async or inline.\n
  113. - this is an example of how we would implement the publish model on croupon\n\n- cachekeys is a dictionary of tuples, (key_format_string, publish_function)\n\n- publish functions are decorated as celery tasks so they can be run async or inline.\n
  114. - this is an example of how we would implement the publish model on croupon\n\n- cachekeys is a dictionary of tuples, (key_format_string, publish_function)\n\n- publish functions are decorated as celery tasks so they can be run async or inline.\n
  115. - before we get into the weeds of the functions, here is how they are used.\n
  116. - before we get into the weeds of the functions, here is how they are used.\n
  117. - before we get into the weeds of the functions, here is how they are used.\n
  118. - before we get into the weeds of the functions, here is how they are used.\n
  119. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  120. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  121. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  122. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  123. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  124. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  125. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  126. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  127. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  128. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  129. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  130. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  131. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  132. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  133. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  134. - key_fmt % kwargs is naive and you will need to do better cleaning of hash keys based on arguments\n\n- FOREVER_TIMEOUT can be 0 if you are using pylibmc, otherwise set it to one year...\n\n- we fall back to thundering herd if there is a cache miss using .apply()\n
  135. \n