Web Performance Daybook


Web Performance Daybook, Volume 2 Edited by Stoyan Stefanov Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo D o wnload from Wow! eBook Web Performance Daybook, Volume 2 Edited by Stoyan Stefanov Copyright © 2012 Stoyan Stefanov. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Mary Treseler Production Editor: Melanie Yarbrough Proofreader: Nancy Reinhardt Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano June 2012: First Edition. Revision History for the First Edition: 2012-06-15 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449332914 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Web Performance Daybook Volume 2, the cover image of a sugar squirrel biak glider, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-33291-4 [LSI] 1339598947 Table of Contents Foreword ................................................................... xi by Steve Souders From the Editor ............................................................. xiii by Stoyan Stefanov About the Authors ........................................................... xv Preface ................................................................... xxvii 1. WebPagetest Internals .................................................. 1 by Patrick Meenan Function Interception 2 Code Injection 2 Resulting Browser Architecture 3 Get the Code 4 Browser Advancements 4 2. localStorage Read Performance ........................................... 5 by Nicholas Zakas The Benchmark 6 What’s Going On? 6 Optimization Strategy 7 Follow Up 8 3. Why Inlining Everything Is NOT the Answer ................................ 11 by Guy Podjarny No Browser Caching 11 No Edge Caching 12 No Loading On-Demand 13 iii Invalidates Browser Look-Ahead 14 Flawed Solution: Inline Everything only on First Visit 14 Summary and Recommendations 15 4. The Art and Craft of the Async Snippet ..................................... 17 by Stoyan Stefanov The Facebook Plug-ins JS SDK 17 Design Goals 19 The Snippet 19 Appending Alternatives 21 Whew! 22 What’s Missing? 22 First Parties 22 Parting Words: On the Shoulders of Giants 23 5. Carrier Networks: Down the Rabbit Hole ................................... 25 by Tim Kadlec Variability 25 Latency 26 Transcoding 26 Gold in Them There Hills 27 4G Won’t Save Us 28 Where Do We Go from Here? 28 Light at the End of the Tunnel 28 6. The Need for Parallelism in HTTP ......................................... 31 by Brian Pane Introduction: Falling Down the Stairs 31 Current Best Practices: Working around HTTP 32 Experiment: Mining the HTTP Archive 33 Results: Serialization Abounds 34 Recommendations: Time to Fix the Protocols 34 7. Automating Website Performance ........................................ 37 by Josh Fraser 8. Frontend SPOF in Beijing ................................................ 39 by Steve Souders Business Insider 39 CNET 40 O’Reilly Radar 42 iv | Table of Contents The Cause of Frontend SPOF 43 Avoiding Frontend SPOF 44 Call to Action 44 9. All about YSlow . ....................................................... 47 by Betty Tso 10. Secrets of High Performance Native Mobile Applications . . . . . . . . . . . . . . . . . . . . . 51 by Israel Nir Keep an Eye on Your Waterfalls 52 Compress Those Resources 53 Don’t Download the Same Content Twice 53 Can Too Much Adriana Lima Slow You Down? 54 Epilogue 55 11. Pure CSS3 Images? Hmm, Maybe Later . ................................... 57 by Marcel Duran The Challenge 57 Getting My Hands Dirty with CSS3 Cooking 57 Cross-Browser Results 58 Benchmarking 59 Payload 59 Rendering 60 Are We There Yet? 62 Appendix: Code Listings 63 HTML 64 CSS 65 12. Useless Downloads of Background Images in Android . . . . . . . . . . . . . . . . . . . . . . . . 71 by Éric Daspet The Android Problem 71 And the Lack of Solution 72 13. Timing the Web . ...................................................... 73 by Alois Reitbauer Conclusion 77 14. I See HTTP . ........................................................... 79 by Stoyan Stefanov icy 79 Table of Contents | v Some details 79 Walkthrough 79 Todos 83 The Road Ahead 88 All I Want for Christmas… 90 15. Using Intelligent Caching to Avoid the Bot Performance Tax .................. 95 by Matthew Prince 16. A Practical Guide to the Navigation Timing API .............................. 99 by Buddy Brewer Why You Should Care 99 Collecting Navigation Timing Timestamps and Turning Them into Useful Measurements 100 Using Google Analytics as a Performance Data Warehouse 100 Reporting on Performance in Google Analytics 101 Limitations 101 Final Thoughts 102 17. How Response Times Impact Business .................................... 103 by Alexander Podelko 18. Mobile UI Performance Considerations ................................... 107 by Estelle Weyl Battery Life 107 Latency 108 Embedding CSS and JS: A Best Practice? 108 Memory 110 Optimize Images 111 Weigh the Benefits of CSS 112 GPU Benefits and Pitfalls 112 Viewport: Out of Sight Does Not Mean Out of Mind 113 Minimize the DOM 113 UI Responsiveness 113 Summary 114 19. Stop Wasting Your Time Using the Google Analytics Site Speed Report ......... 115 by Aaron Peters Problem: A Bug in Firefox Implementation of the Navigation Timing API 115 Solution: Filter Out the Firefox Timings in Google Analytics 116 Good News: The Bug Was Fixed in Firefox 9 116 vi | Table of Contents Closing Remark 116 20. Beyond Web Developer Tools: Strace ..................................... 119 by Tony Gentilcore What About Other Platforms? 119 Getting Started 120 Zeroing In 120 Example: Local Storage 120 We’ve Only Scratched the Surface 121 21. Introducing mod_spdy: A SPDY Module for the Apache HTTP Server ........... 123 by Bryan McQuade and Matthew Steele Getting Started with mod_spdy 123 SPDY and Apache 123 Help to Improve mod_spdy 124 22. Lazy Evaluation of CommonJS Modules ................................... 127 by Tobie Langel Close Encounters of the Text/JavaScript Type 127 Lazy Loading 128 Lazy Evaluation to the Rescue 129 Building Lazy Evaluation into CommonJS Modules 130 23. Advice on Trusting Advice .............................................. 133 by Billy Hoffman 24. Why You’re Probably Reading Your Performance Measurement Results Wrong (At Least You’re in Good Company) .............................................. 137 by Joshua Bixby The Methodology 138 The Results 138 Conclusions 139 Why Does This Matter? 139 Takeaways 139 25. Lossy Image Compression .............................................. 141 by Sergey Chernyshev Lossy Compression 142 Table of Contents | vii 26. Performance Testing with Selenium and JavaScript ........................ 145 by JP Castro Recording Data 145 Collecting and Analyzing the Data 147 Sample Results 148 Benefits 149 Closing Words 149 Credits 149 27. A Simple Way to Measure Website Performance ........................... 151 by Pavel Paulau Concept 151 Advantages 152 Limitation 152 Conclusion 153 28. Beyond Bandwidth: UI Performance ..................................... 155 by David Calhoun Introduction 155 After the Page Loads: The UI Layer 155 UI Profilers 156 CSS Stress Test 156 CSS Profilers 157 CSS Lint 157 DOM Monster 158 Perception of Speed 158 Tidbits 159 Call for a Focus on UI Performance 159 29. CSS Selector Performance Has Changed! (For the Better) .................... 161 by Nicole Sullivan Style Sharing 162 Rule Hashes 162 Ancestor Filters 162 Fast Path 163 What Is It Still Slow? 163 30. Losing Your Head with PhantomJS and confess.js .......................... 165 by James Pearce Performance Summaries 165 viii | Table of Contents App Cache Manifest 167 Onward and Upward 168 31. Measure Twice, Cut Once ............................................... 169 by Tom Hughes-Croucher Identifying Pages/Sections 170 Identifying Features 171 Optimizing 171 32. When Good Backends Go Bad ........................................... 173 by Patrick Meenan What Is a Good Backend Time? 174 Figuring Out What Is Going On 175 Fixing It 178 Finally 179 33. Web Font Performance: Weighing @font-face Options and Alternatives ....... 181 by Dave Artz Font Hosting Services Versus Rolling Your Own 181 What the FOUT? 182 Removing Excess Font Glyphs 183 JavaScript Font Loaders 184 Introducing Boot.getFont: A Fast and Tiny Web Font Loader 185 Gentlefonts, Start Your Engines! 186 My Observations 190 Final Thoughts 191 Table of Contents | ix Foreword In your hands is the largest collection of web performance articles ever published. It includes performance topics such as open source tools, caching, mobile networks and applications, automation, improving the user experience, HTML5, JavaScript, CSS3, metrics, ROI, and network protocols. The collection of authors is diverse including employees of the world’s largest web companies to independent consultants. At least seven web performance startups are represented among the contributors: Blaze, Cloud- Flare, Log Normal, Strangeloop, Torbit, Turbobytes, and Zoompf. The range of topics and contributors is impressive. But what really impresses me is that, in addition to their day jobs, every contributor also runs one or more open source projects, blogs, writes books, speaks at conferences, organizes meetups, or runs a non-profit. Some do all of these. After a full day of taming JavaScript across a dozen major browsers or tracking down the regression that made page load times spike, what compels these people to contribute back to the web performance community during their “spare time”? Here are some of the responses I’ve received when asking this question: Lack of Formal Training Many of us working on the Web learned our craft on the job. Web stuff either wasn’t in our college curriculum or what we did learn isn’t applicable to what we do now. This on the job training is a long process involving a lot of trial and error. Sharing best practices raises the group IQ and lets new people entering the field come up to speed more quickly. Avoid Repeating the Same Mistakes Mistakes happen during this trial and error process. Sometimes a lot of mistakes happen. We have all experienced banging our heads against a problem in the wee hours of the morning or for days on end, often stumbling on the solution only after a long process of elimination. Thankfully, our sense of community doesn’t allow us to stand by mutely while we watch our peers heading for the same pitfalls. Sharing the solutions we found lets others avoid the same mistakes we made. Obsessed with Optimization By their nature, developers are drawn to optimization. We all strive to make our code the fastest, our algorithms the most efficient, and our architectures the most xi resilient. This obsession doesn’t just stop with our website; we want every website to be optimized. The best way to do that is to share what we know. Like to Help Finally, some people just really like to help others. That look on someone’s face when they realize they just saved a week of work or made their site twice as fast makes us feel like we’ve helped the community grow. As a testimony to this sense of sharing, the authors have dedicated all royalties of this book to the WPO Foundation, a non-profit organization that supports the web per- formance community. Thus, you can enjoy the chapters that lie ahead not only because they are some of the best web performance advice on the planet, but also because it was given to the web performance community selflessly. Enjoy! —Steve Souders xii | Foreword D o wnload from Wow! eBook From the Editor In the spirit of the true high-performance, non-blocking asynchronous delivery, you now have the Web Performance Daybook, Volume 2 published before Volume 1. I hope you'll enjoy reading the book as much as I enjoyed working on it and rubbing (virtual) shoulders with some of the brightest people in our industry. Back in December 2009, I wanted to give an overview of the web performance optimi- zation (WPO) discipline. I decided on a self-imposed deadline of an-article-a-day from December 1 to 24: the format of an advent calendar similar to http://www.24ways .org. As it turned out, 24 articles in a row was quite a challenge and so I was happy and grateful to accept the offers for help from a few friends from the industry: Christian Heilmann (Mozilla), Eric Goldsmith (AOL), and two posts from Ara Pehlivanian (Ya- hoo!). The articles were warmly accepted by the community and then the following year, in December 2010, the calendar was already something people were looking forward to reading. The calendar also got a new home at http://calendar.perfplanet.com as a sub- domain of the “Planet Performance” feed aggregator. And this time around more people were willing to help. Developers of all around our industry were willing to contribute their time, to share and spread their knowledge, announce new tools, and this way create a much better set of 24 articles than a single person could. This is what soon will become Volume 1 of the series of Daybooks. Then came December 2011, and we had so much good content and enthusiasm that we kept going past December 24, all the way to December 31, even publishing two articles on the last day. This is the content that you have in your hands in a book format as Web Performance Daybook, Volume 2. Our WPO community is young, small, but growing, and in need of nourishment in the form of community building events such as the advent calendar. That's why it was exciting to have the opportunity to collaborate on this title with O'Reilly and all 32 authors. I'm really happy with the result and I know that both volumes will serve as a reference and introduction to performance tools, research, techniques, and approaches for years to come. There’s always the risk with outdated content in offline technical publications, but I see references to the calendar articles in the latest conferences today xiii all the time, so I'm confident this knowledge is to remain fresh for quite a while and some of it is even destined to become timeless. Enjoy the book, prepare to learn from the brightest in the industry and, most of all, be ready to make the Web a better place for all of us! —Stoyan Stefanov xiv | From the Editor About the Authors Patrick Meenan Patrick Meenan (http://blog.patrickmeenan.com/) (@patmeenan) created WebPagetest (http://www.webpagetest .org/) while working at AOL and now works at Google with the team that is working to make the Web faster (http://code.google .com/speed/). Nicholas Zakas Nicholas C. Zakas (http://www.nczonline.net/) (@slicknet) is chief architect of WellFurnished, a site dedicated to helping you find beautiful home decor. Prior to that, he worked at Yahoo! for almost five years, where he was a presentation architect, frontend lead for the Yahoo! homepage, and a contributor to the YUI li- brary. He is the author of Maintainable JavaScript (O’Reilly, 2012), Professional JavaScript for Web Developers (Wrox, 2012), Professional Ajax (Wrox, 2007), and High Performance Java- Script (O’Reilly, 2010). Nicholas is a strong advocate for devel- opment best practices including progressive enhancement, ac- cessibility, performance, scalability, and maintainability. He blogs regularly at http://www.nczonline.net/. xv Guy Podjarny Guy Podjarny (http://blaze.io/) (@guypod) is Web Performance and Security expert, specializing in Mobile Web Performance, CTO at Blaze. Guy spent the last decade prior to Blaze as a Soft- ware Architect and Web Application Security expert, driving the IBM Rational AppScan product line from inception to being the leading Web Application Security assessment tool. Guy has filed over 15 patents, presented at numerous conferences, and has published several professional papers. Stoyan Stefanov Stoyan Stefanov (http://phpied.com/) (@stoyanstefanov) is a Facebook engineer, former Yahoo! writer (“JavaScript Patterns”, “Object-Oriented JavaScript”), speaker (JSConf, Velocity, Fron- teers), toolmaker (Smush.it, YSlow 2.0), and a Guitar Hero wan- nabe (http://givepngachance.com/). Tim Kadlec Tim Kadlec (http://timkadlec.com) (@tkadlec) is web developer living and working in northern Wisconsin. His diverse back- ground working with small companies to large publishers and industrial corporations has allowed him to see how the careful application of web technologies can impact businesses of all sizes. Tim organizes Breaking Development (http://bdconf.com), a bi- annual conference dedicated to web design and development for mobile devices. He is currently writing a book entitled Implementing Responsive Design: Building Sites for an Anywhere, Everywhere Web (http://responsiveenhancement.com), due out in the fall of 2012. xvi | About the Authors Brian Pane Brian Pane (http://www.brianp.net/) (@brianpane) is an Internet technology and product generalist. He has worked at companies including Disney, CNET, F5, and Facebook; and all along the way he’s jumped at any opportunity to make software faster. Josh Fraser Josh Fraser (http://onlineaspect.com/) (@joshfraser) is the co- founder and CEO of Torbit, a company that automates front- end optimizations that are proven to increase the speed of your website. Josh graduated from Clemson University with a BS in computer science and previously founded a company called EventVue. He currently lives in Mountain View and is obsessed with speed. Steve Souders Steve Souders (http://stevesouders.com/) (@souders) works at Google (http://www.google.com/) on web performance and open source initiatives. His book, High Performance Web Sites, ex- plains his best practices for performance; it was #1 in Amazon’s Computer and Internet bestsellers. His follow-up book, Even Faster Web Sites, provides performance tips for today’s Web 2.0 applications. Steve is the creator of YSlow, the performance anal- ysis extension to Firebug, with over 2 million downloads. He also created Cuzillion, SpriteMe, and Browserscope. He serves as co- chair of Velocity, the web performance and operations confer- ence from O’Reilly, and is co-founder of the Firebug Working Group. He taught CS193H: High Performance Web Sites at Stanford, and frequently speaks at conferences including OSCON, The Ajax Experi- ence, SXSW, and Web 2.0 Expo. About the Authors | xvii Betty Tso Betty is a Software Development Manager at Amazon. Prior to that, she led the Exceptional Performance Engineering team at Yahoo! and drove the engineering execution and development for Yahoo!'s top Web Performance products like YSlow and Roundtrip. Betty is also an evangelist in the Web Performance Optimization domain. She has spoken at Velocity Conferences, the Yahoo! Frontend Summit, and universities such as Georgia Tech, Duke, UIUC, University of Texas at Austin, and UCSD. She was also co-President of Yahoo! Women-in-Tech, a 600+ members orga- nization that empowers women to succeed in their career, foster employee growth, and inspire young girls to pursue technical careers. Israel Nir Israel Nir (@shunra) likes to create stuff, break other stuff apart, code, the number 0x17, and playing the ukulele. He also works as a team leader at Shunra, where he builds tools to make appli- cations run faster. Marcel Duran Marcel Duran (http://javascriptrules.com/) is currently a Front End Engineer at Twitter, Inc. Prior to that, he was into web per- formance optimization on high traffic sites at Yahoo! Front Page and Search teams where he applied and researched web perfor- mance best practices making pages even faster. On his last role as the Front End Lead for Yahoo!'s Exceptional Performance Team, he was dedicated to YSlow (now as his personal open source project) and other performance tools development, re- searches, and evangelism. xviii | About the Authors Éric Daspet Éric Daspet (http://eric.daspet.name/) (@edasfr) is a web consul- tant in France. He wrote about PHP, founded Paris-Web con- ferences to promote web quality, and is now pushing perfor- mance with a local user group and a future book. Alois Reitbauer Alois Reitbauer (http://blog.dynatrace.com/) (@aloisreitbauer) works as Technology Strategist for dynaTrace software and heads the dynaTrace Center of Excellence. As a major contribu- tor to dynaTrace Labs technology he influences the companies future technological direction. Besides his engineering work, he supports Fortune 500 companies in implementing successful performance management. Matthew Prince Matthew Prince (http://www.cloudflare.com/) (@eastdakota) is the co-founder & CEO of CloudFlare. Matthew wrote his first computer program when he was 7, and hasn’t been able to shake the bug since. After attending the University of Chicago Law School, he worked as an attorney for one day before jumping at the opportunity to be a founding member of a tech startup. He hasn’t looked back. CloudFlare is Matthew’s third entrepreneurial venture. On the side, Matthew teaches Internet law as an adjunct professor, is a certified ski instructor and regular attendee of the Sundance Film Festival. About the Authors | xix D o wnload from Wow! eBook Buddy Brewer Buddy Brewer (@bbrewer) is a co-founder of Log Normal, a company that shows you exactly how much time real people spend waiting on your website. He has worked on web perfor- mance issues in various roles for almost ten years. Alexander Podelko The last fourteen years Alex Podelko (http://alexanderpodelko .com/blog/) (@apodelko) worked as a performance engineer and architect for several companies. Currently he is Consulting Mem- ber of Technical Staff at Oracle, responsible for performance testing and optimization of Hyperion products. Alex currently serves as a director for the Computer Measurement Group (CMG). He maintains a collection of performance-related links and documents. Estelle Weyl Estelle Weyl (http://www.standardista.com/) (@estellevw) started her professional life in architecture, then managed teen health programs. In 2000, she took the natural step of becoming a web standardista. She has consulted for Kodakgallery, Yahoo! and Apple, among others. Estelle provides tutorials and detailed grids of CSS3 and HTML5 browser support in her blog. She is the author of Mobile HTML5 (O’Reilly, Oct. 2011) and HTML5 and CSS3 for the Real World (Sitepoint, May 2011). While not coding, she works in construction, de-hippifying her 1960s throwback abode. xx | About the Authors Aaron Peters Aaron Peters (http://www.aaronpeters.nl/en/) (@aaronpeters) is an independent web performance consultant based in The Neth- erlands. He is a Red Hot Chili Peppers fan and will kick your butt in a snowboard contest anytime. Tony Gentilcore Tony Gentilcore (@tonygentilcore) is a software engineer at Google. He enjoys making the Web faster and has recently added support for Web Timing and async scripts to Google Chrome/ WebKit. Matthew Steele Matthew Steele is a software engineer at Google, working on making the Web faster. Matthew has worked on Page Speed for Firefox and Chrome, has contributed to mod_pagespeed, and most recently, has led design and development of mod_spdy for Apache. About the Authors | xxi Bryan McQuade Bryan McQuade (@bryanmcquade) leads the Page Speed team at Google. He has contributed to various projects that make the Web faster, including Shared Dictionary Compression over HTTP and optimizing web servers to better utilize HTTP. Tobie Langel Tobie Langel (http://tobielangel.com/) (@tobie) is a Software en- gineer at Facebook. He’s also Facebook’s W3C AC Rep. An avid open-source contributor (https://github.com/tobie), he’s mostly known for having co-maintained the Prototype JavaScript Framework. Tobie recently picked up blogging again and rants at blog.tobie.me (http://blog.tobie.me/). In a previous life, he was a professional jazz drummer. Billy Hoffman If there is one thing Billy Hoffman believes in, it’s transparency. In fact, he once got sued over it, but that is another story. Billy continues to push for transparency as founder and CEO of Zoompf, whose products provide visibility into your website’s performance by identifying the specific issues that are slowing your site down. You can follow Zoompf on Twitter (http://twitter .com/zoompf) and read Billy’s performance research on Zoompf’s blog Lickity Split (http://zoompf.com/blog). xxii | About the Authors Joshua Bixby Joshua Bixby (@JoshuaBixby) is president of Strangeloop (http: //www.strangeloopnetworks.com/), which provides website ac- celeration solutions to companies like eBay/PayPal, Visa, Petco, Wine.com, and O’Reilly Media. Joshua also maintains the blog Web Performance Today (http://www.webperformancetoday .com/), which explores issues and ideas about site speed, user behavior, and performance optimization. Sergey Chernyshev Sergey Chernyshev (http://www.sergeychernyshev.com/) (@sergeyche) organizes New York Web Performance Meetup and helps other performance enthusiasts around the world start meetups in their cities. Sergey volunteers his time to run @perf- planet Twitter companion to PerfPlanet site. He is also an open source developer and author of a few web performance-related tools including ShowSlow, SVN Assets, drop-in .htaccess, and more. JP Castro JP Castro (@jphpsf) is a frontend engineer living in San Fran- cisco. He’s passionate about web development and specifically web performance. He blogs at http://blog.jphpsf.com and co-or- ganizes the San Francisco performance meetup. When he’s not talking about performance, he enjoys spending time with his family, being outdoors, sipping craft beers, consuming a full jar of Nutella, and playing video games. About the Authors | xxiii Pavel Paulau Pavel Paulau (@pavelpaulau) is a performance engineer from Minsk, Belarus. Besides his daily work at Couchbase (http://www .couchbase.com), he tries to spread importance of speed as co- author of the WebPerformance.ru blog (http://webperformance .ru/). David Calhoun David Calhoun (@franksvalli) is an independent frontend de- veloper who has been splitting his time between California and Japan. He’s the community news writer for JSMag and keeps a blog (http://davidbcalhoun.com/) with developer and general life thoughts (hard to put that philosophy degree to use…). David specializes in mobile, frontend performance, and sure enough, mobile performance. He formerly worked for Yahoo! Mobile, CBSi/CNET, occasionally contracts for WebMocha, and is currently contracting at Skybox Imaging, working on interfaces for flying satel- lites from browsers. Nicole Sullivan Nicole Sullivan (http://stubbornella.org/) (@stubbornella) is an evangelist, frontend performance consultant, CSS Ninja, and au- thor. She started the Object-Oriented CSS open source project, which answers the question: how do you scale CSS for millions of visitors or thousands of pages? She also consulted with the W3C for their beta redesign, and is the co-creator of Smush.it, an image optimization service in the cloud. Nicole is passionate about CSS, web standards, and scalable frontend architecture for large commercial websites. She speaks about performance at conferences around the world, most re- cently at The Ajax Experience, ParisWeb, and Web Directions North. She co-authored Even Faster Websites and blogs at stubbornella.org. xxiv | About the Authors James Pearce James (http://tripleodeon.com/) (@jamespearce) is Head of Mo- bile Developer Relations at Facebook. He lives in California and in airports around the world. Tom Hughes-Croucher Tom (http://tomhughescroucher.com/) (@sh1mmer) is the prin- cipal consultant at Jetpacks for Dinosaurs, which helps make websites really rather fast. Tom consults with clients like Wal- mart and MySpace to name a few. An industry veteran, Tom has worked for the likes of Yahoo!, Joyent, NASA, Tesco, and many more. Tom co-authored Up and Running with Node.js and lives in San Francisco, CA. Dave Artz David Artz leads the Site Engineering team at AOL. He led AOL’s Optimization team in the past—a team focused on setting stand- ards and developing best practices in frontend engineering, per- formance, and SEO across the teams he now leads. While man- aging multiple teams, he has continued to develop script/CSS/ font loaders as part of his Boot library (https://github.com/artz studio/Boot), an AMD loader for jQuery (https://github.com/artzstudio/jQuery-AMD), and a jQuery plug-in called Sonar (https://github.com/artzstudio/jQuery-Sonar) for easily loading content and functionality in on demand using special “scrollin” and “scrollout” events. About the Authors | xxv Download from Wow! eBook Preface Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter- mined by context. This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does xxvii require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Web Performance Daybook, Volume Two edited by Stoyan Stefanov (O’Reilly). Copyright 2012 Stoyan Stefanov, 978-1-449-33291-4.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Safari® Books Online Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business. Technology professionals, software developers, web designers, and business and cre- ative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training. Safari Books Online offers a range of product mixes and pricing programs for organi- zations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable da- tabase from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Tech- nology, and dozens more. For more information about Safari Books Online, please visit us online. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: http://oreil.ly/web_perf_daybook_v2 xxviii | Preface To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Preface | xxix CHAPTER 1 WebPagetest Internals Patrick Meenan I thought I’d take the opportunity this year to give a little bit of visibility into how WebPagetest gathers the performance data from browsers. Other tools on windows use similar techniques but the information here may not be representative of how other tools work. First off, it helps to understand the networking stack on Windows from a browser’s perspective (Figure 1-1). Figure 1-1. Windows networking stack from browser’s perspective 1 It doesn’t matter what the browser is, if it runs on Windows, the architecture pretty much has to look like the diagram above where all of the communications go through the Windows socket APIs (for that matter, just about any application that talks TCP/ IP on Windows looks like the picture above). Function Interception The key to how WebPagetest works is its ability to intercept arbitrary function calls and inspect or alter the request or response before passing it on to the original imple- mentation (or choosing not to pass it on at all). Luckily someone else did most of the heavy lifting and provided a nice open source library (http://newgre.net/ncodehook) that can take care of the details for you but it basically works like this: • Find the target function in memory (trivial if it is exported from a dll). • Copy the first several bytes from the function (making sure to keep x86 instructions intact). • Overwrite the function entry with a jmp to the new function. • Provide a replacement function that includes the bytes copied from the original function along with a jmp to the remaining code. It’s pretty hairy stuff and things tend to go very wrong if you aren’t extremely careful, but with well-defined functions (like all of the Windows APIs), you can pretty much intercept anything you’d like. One catch is that you can only redirect calls to code running in the same process as the original function, which is fine if you wrote the code but doesn’t help a lot if you are trying to spy on software that you don’t control which leads us to… Code Injection Lucky for me, Windows provides several ways to inject arbitrary code into processes. There is a good overview of several different techniques here: http://www.codeproject .com/KB/threads/winspy.aspx, and there are actually more ways to do it than that but it covers the basics. Some of the techniques insert your code into every process but I wanted to be a lot more targeted and just instrument the specific browser instances that we are interested in, so after a bunch of experimentation (and horrible failures), I ended up using the CreateRemoteThread/LoadLibrary technique which essentially lets you force any process to load an arbitrary dll and execute code in it (assuming you have the necessary rights). 2 | Chapter 1: WebPagetest Internals Resulting Browser Architecture Now that we can intercept arbitrary function calls, it just becomes a matter of identi- fying the “interesting” functions, preferably ones that are used by all the browsers so you can reuse as much code as possible. In WebPagetest, we intercept all the Winsock calls that have to do with resolving host names, connecting sockets, and reading or writing data (Figure 1-2). Figure 1-2. Browser architecture This gives us access to all the network access from the browser and we essentially just keep track of what the browsers are doing. Other than having to decode the raw byte streams, it is pretty straightforward and gives us a consistent way to do the measure- ments across all browsers. SSL does add a bit of a wrinkle so we also intercept calls to the various SSL libraries that the browsers use in order that we can see the unencrypted version of the data. This is a little more difficult for Chrome since the library is compiled into the Chrome code itself, but luckily they make debug symbols available for every build so we can still find the code in memory. Resulting Browser Architecture | 3 The same technique is used to intercept drawing calls from the browser so we can tell when it paints to the screen (for the start render measurement). Get the Code Since WebPagetest is under a BSD license you are welcome to reuse any of the code for whatever purposes you’d like. The project lives on Google Code here: (http://code.goo gle.com/p/webpagetest/) and some of the more interesting files are: • Winsock API interception code (http://webpagetest.googlecode.com/svn/trunk/ agent/wpthook/hook_winsock.cc) • Code injection (http://webpagetest.googlecode.com/svn/trunk/agent/wpthook/inject .cc) Browser Advancements Luckily, browsers are starting to expose more interesting information in standard ways and as the W3C Resource Timing spec (http://w3c-test.org/webperf/specs/ResourceTim ing/) advances, you will be able to access a lot of this information directly from the browser through JavaScript (even from your end users!). To comment on this chapter, please visit http://calendar.perfplanet.com/ 2011/webpagetest-internals/. Originally published on Dec 01, 2011. 4 | Chapter 1: WebPagetest Internals D o wnload from Wow! eBook CHAPTER 2 localStorage Read Performance Nicholas Zakas Web Storage (http://dev.w3.org/html5/webstorage/) has quickly become one of the most popular HTML5-related additions to the web developer toolkit. More specifically, localStorage has found a home in the hearts and minds of web developers everywhere, providing very quick and easy client-side data storage that persists across sessions. With a simple key-value interface, we’ve seen sites take advantage of localStorage in unique and interesting ways: • Disqus (http://www.disqus.com/), the popular feedback management system, uses localStorage to save your comment as you type. So if something horrible happens, you can fire back up the browser and pick up where you left off. • Google (http://www.google.com/) and Bing (http://www.bing.com/) store JavaScript and CSS in localStorage to improve their mobile site performance (more info: http: //www.stevesouders.com/blog/2011/03/28/storager-case-study-bing-google/). Of the use cases I’ve seen, the Google/Bing approach is one that seems to be gaining in popularity. This is partly due to the difficulties of working with the HTML5 appli- cation cache and partly due to the publicity that this technique has gained from the work of Steve Souders and others. Indeed, the more I talk to people about localStor age and how useful it can be for storing UI-related information, the more people I find who have already started to experiment with this technique. What I find intriguing about this use of localStorage is that there’s a built-in, and yet unstated, assumption: that reading from localStorage is an inexpensive operation. I had heard anecdotally from other developers about strange performance issues, and so I set out to quantify the performance characteristics of localStorage, to determine the actual cost of reading data. 5 The Benchmark Not too long ago, I created and shared a simple benchmark that measured reading a value from localStorage against reading a value from an object property. Several others tweaked the benchmark to arrive at a more reliable version (http://jsperf.com/localstor age-vs-objects/10). The end result: reading from localStorage is orders of magnitude slower in every browser than reading the same data from an object property. Exactly how much slower? Take a look at the chart on Figure 2-1 (higher numbers are better). Figure 2-1. Benchmark results You may be confused after looking at this chart because it appears that reading from localStorage isn’t represented. In fact, it is represented, you just can’t see it because the numbers are so low as to not even be visible with this scale. With the exception of Safari 5, whose localStorage readings actually show up, every other browser has such a large difference that there’s no way to see it on this chart. When I adjust the Y-axis values, you can now see how the measurements stack up across browsers: By changing the scale of the Y-axis, you’re now able to see a true comparison of local Storage versus object property reads (Figure 2-2). But still, the difference between the two is so vast that it’s almost comical. Why? What’s Going On? In order to persist across browser sessions, values in localStorage are written to disk. That means when you’re reading a value from localStorage, you’re actually reading some bytes from the hard drive. Reading from and writing to a hard drive are expensive 6 | Chapter 2: localStorage Read Performance operations, especially as compared to reading from and writing to memory. In essence, that’s exactly what my benchmark was testing: the speed of reading a value from mem- ory (object property) compared to reading a value from disk (localStorage). Making matters more interesting is the fact that localStorage data is stored per-origin, which means that it’s possible for two or more tabs in a browser to be accessing the same localStorage data at the same time. This is a big pain for browser implementors who need to figure out how to synchronize access across tabs. When you attempt to read from localStorage, the browser needs to stop and see if any other tab is accessing the same area first. If so, it must wait until the access is finished before the value can be read. So the delay associated with reading from localStorage is variable—it depends a lot on what else is going on with the browser at that point in time. Optimization Strategy Given that there is a cost to reading from localStorage, how does that affect how you would use it? Before coming to a conclusion, I ran another benchmark (http://jsperf .com/localstorage-string-size) to determine the effect of reading different-sized pieces of data from localStorage. The benchmarks saves four different size strings, 100 charac- ters, 500 characters, 1,000 characters, and 2,000 characters, into localStorage and then reads them out. The results were a little surprising: across all browsers, the amount of data being read did not affect how quickly the read happened. I ran the test multiple times and implored my Twitter followers (https://twitter.com/ slicknet/status/139475625793699840) to get more information. To be certain, there Figure 2-2. Scaled results Optimization Strategy | 7 were definitely a few variances across browsers, but none that were large enough that it really makes a difference. My conclusion: it doesn’t matter how much data you read from a single localStorage key. I followed up with another benchmark (http://jsperf.com/localstorage-string-size-re trieval) to test my new conclusion that it’s better to do as few reads as possible. The results correlated with the earlier benchmark in that reading 100 characters 10 times was around 90% slower across most browsers than reading 10,000 characters one time. Given that, the best strategy for reading data from localStorage is to use as few keys as possible to store as much data as possible. Since it takes roughly the same amount of time to read 10 characters as it does to read 2,000 characters, try to put as much data as possible into a single value. You’re getting hit each time you call getItem() (or read from a localStorage property), so make sure that you’re getting the most out of the expense. The faster you get data into memory, either a variable or an object property, the faster all subsequent actions. Follow Up In the time since I first published this article, there has been a lot of discussion around localStorage performance. It began with a blog post by Mozilla's Chris Heilmann titled, “There's No Simple Solution for localStorage.” In that post, Chris introduced the idea that localStorage as a whole has performance problems. After several follow up blog posts by others, including myself, I was finally able to get in touch with Jonas Sicking, one of the engineers responsible for implementing localStorage in Firefox. Indeed, there is a performance issue with localStorage, but it's not as simple as reads taking a bit longer than reads on the simple object. The heart of the problem is that localStor age is a synchronous API, which leaves the browser with very few choices as to imple- mentation. All localStorage data is stored in a file on disk. That means in order for you to have access to that data in JavaScript the browser must first read that file into mem- ory. When that read occurs is the performance issue. It could occur with the first access of localStorage, but then the browser would freeze while the read happened. That may not be a big deal when dealing with a small amount of data, but if you've used the whole 5 MB limit, there could be a noticeable effect. Another solution, the one employed by Firefox, is to read the localStorage data file as a page is being loaded. This ensures that later access to localStorage is as fast as possible and has predictable performance. The downside of that approach is that the read from file could adversely affect the loading time of the page. As I'm writing this, there is still no solution to this particular problem. Some are calling for a completely new API to replace localStorage while others are intent on fixing the existing API. Regardless of what happens, there is likely to be a lot more research done in the area of client-side data storage soon. 8 | Chapter 2: localStorage Read Performance To comment on this chapter, please visit http://calendar.perfplanet.com/ 2011/localstorage-read-performance/. Originally published on Dec 02, 2011. Follow Up | 9 CHAPTER 3 Why Inlining Everything Is NOT the Answer Guy Podjarny Every so often I get asked if the best frontend optimization wouldn’t be to simply inline everything. Inlining everything means embedding all the scripts, styles, and images into the HTML, and serving them as one big package. This question is a great example of taking a best practice too far. Yes, reducing the number of HTTP requests is a valuable best practice. Yes, inlining everything is the ultimate way to reduce the number of requests (in theory to one). But NO, it’s not the best way to make your site faster. While reducing requests is a good practice, it’s not the only aspect that matters. If you inline everything, you fulfill the “Reduce Requests” goal, but you’re missing many oth- ers. Here are some of the specific reasons you shouldn’t inline everything. No Browser Caching The most obvious problem with inlining everything is the loss of caching. If the HTML holds all the resources, and the HTML is not cacheable by itself, the resources are re- downloaded every time. This means the first page load on a new site may be faster, but subsequent pages or return visitors would experience a slower page load. For example, let’s look at the repeat visit of the New York Times’ home page (Ta- ble 3-1, Figure 3-1). Thanks to caching, the original site loads in 2.7 seconds. If we inline the JavaScript files on that page, the repeat visit load time climbs to 3.2 seconds, and the size doubles. Visually, the negative impact is much greater, due to JavaScript’s impact on rendering. 11 Table 3-1. www.nyt.com IE8; DSL; Dulles, VA Repeat view Load time # Request # Bytes Original Site 2.701 seconds 46 101 KB Inlined External JS Files 3.159 seconds 36 212 KB Figure 3-1. www.nyt.com Even if the HTML is cacheable, the cache duration has to be the shortest duration of all the resources on the page. If your HTML is cacheable for 10 minutes, and a resource in the page is cacheable for a day, you’re effectively reducing the cacheability of the resource to be 10 minutes as well. No Edge Caching The traditional value of CDNs is called Edge Caching: caching static resources on the CDN edge. Cached resources are served directly from the edge, and thus delivered much faster than routing all the way to the origin server to get them. When inlining data, the resources are bundled into the HTML, and from the CDN’s perspective, the whole thing is just one HTTP response. If the HTML is not cacheable, this entire HTTP response isn’t cacheable either. Therefore, the HTML and all of its resources would need to be fetched from the origin every time a user requests the page, while in the standard case many of the resources could have been served from the Edge Cache. As a result, even first-time visitors to your site are likely to get a slower experience from a page with inlined resources than from a page with linked resources. This is especially true when the client is browsing from a location far from your server. For example, let’s take a look at browsing the Apple home page from Brazil, using IE8 and a cable connection. (Table 3-2, Figure 3-2) Modifying the site to inline images increased the load time from about 2.4s to about 3.1s, likely since the inlined image data had to be fetched from the original servers and not the CDN. While the number of requests decreased by 30%, the page was in fact slower. 12 | Chapter 3: Why Inlining Everything Is NOT the Answer Table 3-2. www.apple.com IE8; Cable; Sao Paolo, Brazil First view Load time # Request # Bytes Original Site 2.441 seconds 36 363 KB Inlined Images 3.157 seconds 26 361 KB Figure 3-2. www.apple.com No Loading On-Demand Loading resources on-demand is an important category of performance optimizations, which attempt to only load a resource when it’s actually required. Resources may be referenced, but not actually downloaded and evaluated until the conditions require it. Browsers offer a built-in loading-on-demand mechanism for CSS images. If a CSS rule references a background image, the browser would only download it if at least one element on the page matched the rule. Another example is loading images on-demand (http://www.blaze.io/technical/the-impact-of-image-optimization/), which only down- loads page images as they scroll into view. The Progressive Enhancement approach to Mobile Web Design uses similar concepts for loading JavaScript and CSS only as needed. Since inlining resources is a decision made on the server, it doesn’t benefit from loading on-demand. This means all the images (CSS or page images) are embedded, whether they’re needed by the specific client context or not. More often than not, the value gained by inlining is lower than the value lost by not having these other optimizations. As an example, I took The Sun’s home page and applied two conflicting optimizations to it (Table 3-3, Figure 3-3). The first loads images on demand, and the second inlines all images. When loading images on demand, the page size added up to about 1MB, and load time was around 9 seconds. When inlining images, the page size grew to almost 2MB, and the load time increased to 16 seconds. Either way the page makes many requests, but the load and size differences between inlining images and images on-demand are very noticeable. No Loading On-Demand | 13 D o wnload from Wow! eBook Table 3-3. www.thesun.co.uk IE8; DSL; Dulles, VA First view Load time # Request # Bytes Loading Images On-Demand 9.038 seconds 194 1,028 KB Inlined Images 16.190 seconds 228 1,979 KB Figure 3-3. www.thesun.co.uk Invalidates Browser Look-Ahead Modern browsers use smart heuristics to try and prefetch resources at the bottom of the page ahead of time. For instance, if your site references http://www.3rdparty.com/ code.js towards the end of the HTML, the browser is likely to resolve the DNS for www. 3rdparty.com, and probably even start downloading the file, long before it can actually execute it. In a standard website, the HTML itself is small, and so the browser only needs to download a few dozen KB before it sees the entire HTML. Once it sees (and parses) the entire HTML, it can start prefetching as it sees fit. If you’re making heavy use of inlining, the HTML itself becomes much bigger, possibly over 0.5MB in size. While download- ing it, the browser can’t see and accelerate the resources further down the page—many of which are third-party tools you couldn’t inline. Flawed Solution: Inline Everything only on First Visit A partial solution to the caching problem works as follows: • The first time a user visits your site, inline everything and set a cookie for the user • Once the page loads, download all the resources as individual files. —Or store the data into a Scriptable Cache (http://www.blaze.io/technical/ browser-cache-2-0-scriptable-cache/) • If a user visits the page and has the cookie, assume it has the files in the cache, and don’t inline the data. 14 | Chapter 3: Why Inlining Everything Is NOT the Answer While better than nothing, the flaw in this solution is that it assumes a page is either entirely cached or entirely not cached. In reality, websites and cache states are extremely volatile. A user’s cache can only hold less than a day’s worth of browsing data: An average user browses 88 pages/day (http://blog.newrelic.com/wp-content/uploads/infog _061611.png), an average page weighs 930KB (http://httparchive.org/interesting.php #bytesperpage), and most desktop browsers cache no more than 75MB of data (http:// www.blaze.io/mobile/understanding-mobile-cache-sizes/). For mobile, the ratio is even worse. Cookies, on the other hand, usually live until their defined expiry date. Therefore, using a cookie to predict the cache state becomes pointless very quickly, and then you’re just back to not inlining at all. One of the biggest problems with this solution is that it demos better than it really is. In synthetic testing, like WebPageTest tests, a page is indeed either fully cached (i.e., all its resources are cached), or it’s not cached at all. These tests therefore make the inline-on-first-visit approach look like the be all and end all, which is just plain wrong. Another significant problem is that not all CDNs support varying cache by a cookie. Therefore, if some of your pages are cacheable, or if you think you might make them cacheable later, it may be hard to impossible to get the CDN to cache two different versions of your page, and choose the one to serve based on a cookie. Summary and Recommendations Our world isn’t black and white. The fact that reducing the number of requests is a good way to accelerate your site doesn’t mean it’s the only solution. If you take it too far, you’ll end up slowing down your site, not speeding it up. Despite all these limitations, inlining is still a good and important tool in the world of frontend Optimization. As such, you should use it, but be careful not to abuse it. Here are some recommendations about when to use inlining, but keep in mind you should verify that they get the right effect on your own site: Very small files should be inlined. The HTTP overhead of a request and response is often ~1KB, so files smaller than that should definitely be inlined. Our testing shows you should almost never inline files bigger than 4KB. Page images (i.e., images referenced from the page, not CSS) should rarely be inlined. Page images tend to be big in size, they don’t block other resources in the normal use, and they tend to change more frequently than CSS and Scripts. To optimize image file loading, load images on-demand instead (http://www.blaze.io/technical/ the-impact-of-image-optimization/). Anything that isn’t critical for the above-the-fold page view should not be inlined. Instead, it should be deferred till after page load, or at least made async. Summary and Recommendations | 15 Be careful with inlining CSS images. Many CSS files are shared across many pages, where each page only uses a third or less of the rules. If that’s the case for your site, there’s a decent chance your site will be faster if you don’t inline those images. Don’t rely only on synthetic measurements—use RUM (Real User Monitoring). Tools like WebPageTest are priceless, but they don’t show everything. Measure real world performance and use that information alongside your synthetic test re- sults. To comment on this chapter, please visit http://calendar.perfplanet.com/ 2011/why-inlining-everything-is-not-the-answer/. Originally published on Dec 03, 2011. 16 | Chapter 3: Why Inlining Everything Is NOT the Answer CHAPTER 4 The Art and Craft of the Async Snippet Stoyan Stefanov JavaScript downloads block the loading of other page components. That’s why it’s important (make that critical) to load script files in a nonblocking asynchronous fash- ion. If this is new to you, you can start with this post on the Yahoo User Interface (YUI) library blog (http://www.yuiblog.com/blog/2008/07/22/non-blocking-scripts/) or the Per- formance Calendar article (http://calendar.perfplanet.com/2010/the-truth-about-non -blocking-javascript/). In this post, I’ll examine the topic from the perspective of a third party—when you’re the third party, providing a snippet for other developers to include on their pages. Be it an ad, a plug-in, widget, visits counter, analytics, or anything else. Let’s see in much detail how this issue is addressed in Facebook’s JavaScript SDK. The Facebook Plug-ins JS SDK The Facebook JavaScript SDK is a multipurpose piece of code that lets you integrate Facebook services, make API calls, and load social plug-ins such as the Like button (https://developers.facebook.com/docs/reference/plugins/like/). The task of the SDK when it comes to Like button and other social plug-ins is to parse the page’s HTML code looking for elements (such as or