D3.js实战


MANNING Elijah Meeks D3.js in Action ELIJAH MEEKS MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2015 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Susanna Kline 20 Baldwin Road Technical development editor Valentin Crettaz PO Box 761 Copyeditor: Tara Walsh Shelter Island, NY 11964 Proofreader: Katie Tennant Technical Proofreader: Jon Borgman Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617292118 Printed in the United States of America 12345678910–EBM–201918171615 iii brief contents PART 1D3.JS FUNDAMENTALS....................................................1 1 ■ An introduction to D3.js 3 2 ■ Information visualization data flow 46 3 ■ Data-driven design and interaction 77 PART 2THE PILLARS OF INFORMATION VISUALIZATION ..........105 4 ■ Chart components 107 5 ■ Layouts 139 6 ■ Network visualization 175 7 ■ Geospatial information visualization 204 8 ■ Traditional DOM manipulation with D3 240 PART 3ADVANCED TECHNIQUES............................................259 9 ■ Composing interactive applications 261 10 ■ Writing layouts and components 283 11 ■ Big data visualization 303 12 ■ D3 on mobile (online only) v contents preface xi acknowledgments xiii about this book xiv about the cover illustration xvii PART 1D3.JS FUNDAMENTALS ........................................1 1 An introduction to D3.js 3 1.1 What is D3.js? 4 1.2 How D3 works 4 Data visualization is more than data visualization 5 D3 is about selecting and binding 10 ■ D3 is about deriving the appearance of web page elements from bound data 11 Web page elements can now be divs, countries, and flowcharts 12 1.3 Using HTML5 12 The DOM 12 ■ Coding in the console 18 ■ SVG 18 CSS 26 ■ JavaScript 29 1.4 Data standards 34 Tabular data 34 ■ Nested data 35 ■ Network data 36 Geographic data 36 ■ Raw data 37 ■ Objects 37 1.5 Infoviz standards expressed in D3 38 CONTENTSvi 1.6 Your first D3 app 40 Hello world with divs 40 ■ Hello World with circles 41 A conversation with D3 42 1.7 Summary 45 2 Information visualization data flow 46 2.1 Working with data 47 Loading data 47 ■ Formatting data 50 Transforming data 52 ■ Measuring data 56 2.2 Data-binding 57 Selections and binding 57 ■ Accessing data with inline functions 59 ■ Integrating scales 61 2.3 Data presentation style, attributes, and content 65 Visualization from loaded data 65 ■ Setting channels 67 Enter, update, and exit 70 2.4 Summary 75 3 Data-driven design and interaction 77 3.1 Project architecture 78 Data 78 ■ Resources 79 ■ Images 79 ■ Style sheets 79 External libraries 80 3.2 Interactive style and DOM 82 Events 82 ■ Graphical transitions 84 DOM manipulation 86 ■ Using color wisely 88 3.3 Pregenerated content 94 Images 94 ■ HTML fragments 95 ■ Pregenerated SVG 98 3.4 Summary 102 PART 2THE PILLARS OF INFORMATION VISUALIZATION ..............................................105 4 Chart components 107 4.1 General charting principles 108 Generators 109 ■ Components 109 ■ Layouts 109 4.2 Creating an axis 110 Plotting data 110 ■ Styling axes 112 CONTENTS vii 4.3 Complex graphical objects 117 4.4 Line charts and interpolations 124 Drawing a line from points 126 ■ Drawing many lines with multiple generators 128 ■ Exploring line interpolators 129 4.5 Complex accessor functions 130 4.6 Summary 138 5 Layouts 139 5.1 Histograms 140 5.2 Pie charts 142 Drawing the pie layout 144 ■ Creating a ring chart 145 Transitioning 146 5.3 Pack layouts 148 5.4 Trees 152 5.5 Stack layout 158 5.6 Plugins to add new layouts 163 Sankey diagram 163 ■ Word clouds 169 5.7 Summary 174 6 Network visualization 175 6.1 Static network diagrams 176 Network data 177 ■ Adjacency matrix 179 Arc diagram 182 6.2 Force-directed layout 185 Creating a force-directed network diagram 186 SVG markers 188 ■ Network measures 190 Force layout settings 193 ■ Updating the network 195 Removing and adding nodes and links 197 Manually positioning nodes 201 ■ Optimization 202 6.3 Summary 203 7 Geospatial information visualization 204 7.1 Basic mapmaking 206 Finding data 206 ■ Drawing points on a map 212 Projections and areas 213 ■ Interactivity 215 7.2 Better mapping 216 Graticule 217 ■ Zoom 217 CONTENTSviii 7.3 Advanced mapping 221 Creating and rotating globes 221 ■ Satellite projection 226 7.4 TopoJSON data and functionality 227 TopoJSON the file format 227 ■ Rendering TopoJSON 228 Merging 229 ■ Neighbors 232 7.5 Tile mapping with d3.geo.tile 233 7.6 Further reading for web mapping 237 Transform zoom 237 ■ Canvas drawing 237 Raster reprojection 238 ■ Hexbins 238 Voronoi diagrams 238 ■ Cartograms 238 7.7 Summary 239 8 Traditional DOM manipulation with D3 240 8.1 Setup 241 CSS 242 ■ HTML 243 8.2 Spreadsheet 243 Making a spreadsheet with table 243 ■ Making a spreadsheet with divs 245 ■ Animating our spreadsheet 246 8.3 Canvas 248 Drawing with canvas 249 ■ Drawing and storing many images 250 8.4 Image gallery 252 Interactively highlighting DOM elements 254 Selecting 255 8.5 Summary 257 PART 3ADVANCED TECHNIQUES ................................259 9 Composing interactive applications 261 9.1 One data source, many perspectives 263 Data dashboard basics 265 ■ Spreadsheet 266 ■ Bar chart 267 Circle pack 267 ■ Redraw: resizing based on screen size 268 9.2 Interactivity: hover events 270 9.3 Brushing 274 Creating the brush 274 ■ Making our brush more user friendly 278 ■ Understanding brush events 281 Redrawing components 281 9.4 Summary 282 CONTENTS ix 10 Writing layouts and components 283 10.1 Creating a layout 284 10.2 Writing your own components 291 Loading sample data 292 ■ Linking components to scales 295 ■ Adding component labels 298 10.3 Summary 301 11 Big data visualization 303 11.1 Big geodata 304 Creating random geodata 306 ■ Drawing geodata with canvas 309 ■ Mixed-mode rendering techniques 310 11.2 Big network data 316 11.3 Optimizing xy data selection with quadtrees 320 Generating random xy data 321 ■ xy brushing 322 11.4 More optimization techniques 326 Avoid general opacity 326 ■ Avoid general selections 326 Precalculate positions 327 11.5 Summary 327 12 D3 on mobile Available online at www.manning.com/D3.jsinAction index 329 xi preface I’ve always loved making games. Board games, role-playing games, computer games— I just love abstracting things into rules, numbers, and categories. As a natural conse- quence, I’ve always loved data visualization. Damage represented as a bar, spells repre- sented with icons, territory broken down into hexes, treasure charted out in a variety of ways. But it wasn’t until I started working with maps in grad school that I became aware of the immeasurable time and energy people have invested in understanding how to best represent data. I started learning D3 after having worked with databases, map data, and network data in a number of different desktop packages, and also coding in Flash. So I was nat- urally excited when I was introduced to D3, a JavaScript library that deals not only with information visualization generally, but also with the very specific domains of geospa- tial data and network data. The fact that it lives in the DOM and follows web standards was a bonus, especially because I’d been working with Flash, which wasn’t known for that kind of thing. Since then, I’ve used D3 for everything, including the creation of UI elements that you’d normally associate with jQuery. When I was approached by Manning to write this book, I thought it would be the perfect opportunity for me to look deeply at D3 and make sure I knew how every little piece of the library worked, while writing a book that didn’t just introduce D3 but really dived into the different pieces of the library that I found so exciting, like mapping and networks, and tied them together. As a result, the book ended up being much longer than I expected and covers everything from the basics of generating lines and areas to using most of the layouts PREFACExii that come to mind when you think of data visualization. It also devotes some space to maps, networks, mobile, and optimization. In the end, I tried to give readers a broad approach to data visualization tools, whether that means maps or networks or pie charts. xiii acknowledgments I’d like to thank my wife, Hajra, for giving me the support and inspiration and the keen editorial eye necessary for a book like this. I’d also like to thank Manning Publications for the chance to write this book. The exercise of writing a book like this serves as a finishing school for learning about a library, and as a result of writing D3.js in Action, I feel more confident with D3 than I would have had I simply created applications. I’d like to especially thank my editor, Susanna Kline, for her patience and hard work at turning my prose into something worth buying. Also, thanks to the production team and everyone else at Manning who worked on the book behind the scenes. The following reviewers provided feedback on the manuscript at various stages of its development, and I thank them for their time and effort: Prashanth Babu V V, Dwight Barry, Margriet Bruggeman, Nikander Bruggeman, Matthew Faulkner, Jim Frohnhofer, Ntino Krampis, Andrea Mostosi, Arun Noronha, Alvin Raj, Adam Tolley, and Stephen Wakely. Thanks also to technical editor Valentin Crettaz and technical proofreader Jon Borgman for lending their expertise and making this a much bet- ter book. Finally, I’d like to thank Stanford University Library and all the people there, but especially the head of that library, Mike Keller, for giving me the opportunity to use D3 to create amazing new research and applications in a number of exciting projects. xiv about this book People come to data visualization, and D3 particularly, from three different areas. The first is traditional web development, where they assume D3 is a charting library or, less commonly, a mapping library. The second is more traditional software development, like Java, where D3 is part of the transition into HTML5 development. The last area is a trajectory that involves statistical analysis using R, Python, or desktop apps. In each case, D3 represents two major transitions for folks: modern web development and data visualization. I touch on aspects of both that may give a reader more grounding in what I expect to be new and strange fields. Someone who’s intimately familiar with JavaScript may find that some of these subjects (like function chaining) are already well understood, and others who know data visualization well may feel the same way about some of the general principles, like graphical primitives. Although I do provide an introduction to D3, the focus of this book is on a more exhaustive explanation of key principles of the library. Whether you’re just getting started with D3, or you’re looking to develop more advanced skills, this book provides you with the tools you need to create whatever data visualization you can think of. Roadmap This book is split into three parts. The first three chapters focus on the fundamen- tals of D3. You’ll see data-binding, loading data, and creating graphical elements from data in a variety of different ways. It also deals with scales, color, and other important aspects of data visualization that you might already know well. Some of the core technologies used by D3, like JavaScript, CSS, and SVG, are explained through- out these chapters. ABOUT THIS BOOK xv The next five chapters use D3 in the ways we typically think of. Chapter 4 teaches you how to create simple graphics from data, such as line charts, axes, and boxplots. Chapter 5 gives an in-depth exploration of various traditional data visualization lay- outs like pie charts, tree layouts, and word clouds. Chapter 6 is devoted to network visualization, which might seem exotic, but network visualization is being used more and more in a variety of domains. Chapter 7 dives into the rich mapping capabilities in D3, and includes leveraging TopoJSON to do interesting geodata manipulation in the browser. Chapter 8 is devoted to manipulating traditional HTML elements, like paragraphs and lists, to demonstrate that D3 is not tied to SVG. The last three chapters and chapter 12 (online only) cover topics that can be con- sidered deep diving into D3. I’ve found that each has become an important part of my own practice. This includes principles for wiring up your own data dashboard, creat- ing your own D3 layouts and components, optimizing data visualization for large data- sets, and writing data visualization for mobile. Even if you don’t think you’ll ever be using D3 in these ways, each of these chapters still touches on key aspects of using D3. How to use this book If you’re just getting started with D3, I suggest going through chapters 1 through 4 in order. Each chapter builds on the last and establishes the basic principles not only of D3 but also of data visualization. After that, it depends on what you plan to use D3 for. If your data is mostly geographic, then you can jump to chapter 7, and similarly, if your data is mostly network data, you can jump to chapter 6. If you’re doing traditional data visualization, then I suggest going to chapter 5 and then on to chapter 9 to start think- ing about dashboards, which are a key component of traditional data visualization. If you’ve been using D3 for a while and want to improve your skills, I suggest skim- ming the first three chapters. The parts that I think might be of particular interest are in chapter 3, and deal with color and loading external resources like SVG icons or HTML content. You might also want to review generators and components in chap- ter 4 to fill in any gaps you might have dealing with these common, but often under- examined, parts of D3. After that, it depends on what you see as your strengths and what you see as your goals for using D3. If you want to maximize traditional data visualization, take a look at chapter 5 to see the layouts, and then look at chapter 9 for dashboards. You’re probably familiar with most of the content there, but these chapters deal with it more exhaustively than you likely have experienced. After that, look at chapter 11 and see if there are any optimization techniques you might want to bring into your data visualization, or look at chapter 8 and think about how you might use the D3 tricks you know to build UI elements and otherwise do traditional web development. Much of the value of this book comes in chapters 6 and 7, which go into great detail about using D3 for two major areas of data visualization: networks and maps. Along those lines, the use of HTML5 canvas in chapters 8 and 11 is an area that even experienced D3 developers might not be familiar with. ABOUT THIS BOOKxvi Regardless of your level of experience with D3, I recommend you really spend some time with chapter 10, which deals with the structure of layouts and components while showing you how to build your own. Beginning to build modular, reusable com- ponents and layouts will allow you to create not only effective data visualization, but also an effective career in visualizing data. Chapter 12 is available online only from the publisher’s website at www.manning .com/D3.jsinAction and is a fun read that will expand your horizons. Online graphics Most of the graphics in this book were created in color and are meant to be viewed in color. The eBook versions do include color graphics, but the print book is printed in grayscale. To view the color graphics, please refer to the eBook versions in PDF, ePub, and Kindle formats, which are available to pBook owners for free after they register their print book at www.manning.com/D3.jsinAction. About one third of the graphics in this book also have an online component. To see the online graphic and the code that was used to generate it, please look for this icon in the captions of certain figures: . In the eBook versions, clicking on the icon will take you to the interactive graphic online. For print book readers, please go to the publisher’s website at www.manning .com/D3.jsinAction where you will find the interactive graphics listed by figure num- ber. By clicking on the URLs for those figures, you will be able to view the graphics online on your computer or tablet as you read the print book. Code conventions Initial code examples in chapters are complete, with later code examples that extend an initial example only showing the code that has changed. It’s best to use the source code and online examples alongside the text. The line lengths of some of the exam- ples exceed the page width, and in cases like these, the ➥ marker is used to indicate that a line has been wrapped for formatting. All source code in listings or in text is in a fixed-width font like this to separate it from ordinary text. Code annotations accompany many of the listings, highlighting important concepts. Source code downloads The source code for the examples in this book is available online from the pub- lisher’s website at www.manning.com/D3.jsinAction, and a list of all interactive ver- sions is hosted on GitHub and can be found at emeeks.github.io/d3ia/. Software requirements D3.js requires a browser to run, and you should have a local web server installed on your computer to host your code. xvii about the cover illustration The figure on the cover of D3.js in Action is captioned “Habit of a Moorish Pilgrim Returning from Mecca in 1586.” The illustration is taken from Thomas Jefferys’ A Col- lection of the Dresses of Different Nations, Ancient and Modern (four volumes), London, published between 1757 and 1772. The title page states that these are hand-colored copperplate engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called “Geographer to King George III.” He was an English cartographer who was the leading map supplier of his day. He engraved and printed maps for government and other official bodies and produced a wide range of commercial maps and atlases, especially of North America. His work as a mapmaker sparked an interest in local dress customs of the lands he surveyed and mapped, an interest that is brilliantly dis- played in this four-volume collection. Fascination with faraway lands and travel for pleasure were relatively new phenom- ena in the late eighteenth century, and collections such as this one were popular, introducing both the tourist as well as the armchair traveler to the inhabitants of other countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the uniqueness and individuality of the world’s nations some 200 years ago. Dress codes have changed since then, and the diversity by region and country, so rich at the time, has faded away. It is now often hard to tell the inhabitant of one continent from another. Perhaps, trying to view it optimistically, we have traded a cultural and visual diversity for a more varied personal life, or a more varied and interesting intellectual and technical life. ABOUT THE COVER ILLUSTRATIONxviii At a time when it is hard to tell one computer book from another, Manning cele- brates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Jeffreys’ pictures. Part 1 D3.js fundamentals The first three chapters introduce you to the fundamental aspects of D3 and get you started with creating graphical elements in SVG using data. Chapter 1 lays out how D3 relates to the DOM, HTML, CSS, and JavaScript, and provides a few examples of how to use D3 to create elements on a web page. Chapter 2 focuses on loading, measuring, processing, and changing your data in prepara- tion for data visualization using the various functions D3 includes for data manipulation. Chapter 3 turns toward design and explains how you can use D3 color functions for more effective data visualization, as well as load external ele- ments such as HTML for modal dialogs or icons in raster and vector formats. In all, part 1 shows you how to load, process, and visually represent data in SVG without relying on built-in layouts or components, which is critical for using and extending those layouts and components. 3 An introduction to D3.js Note to print book readers: Many graphics in this book are meant to be viewed in color. The eBook versions display the color graphics, so they should be referred to as you read. To get your free eBook in PDF, ePub, and Kindle formats, go to manning.com/D3.jsinAction to register your print book. D3 stands for data-driven documents. It’s a brand name, but also a class of applica- tions that have been offered on the web in one form or another for years. For quite some time we’ve been building and working with data-driven documents such as interactive dashboards, rich internet applications, and dynamically driven content. In one sense, the D3.js library is an iterative step in a chain of technologies used for data-driven documents, but in another sense, it’s a radical step. This chapter covers ■ The basics of HTML, CSS, and the Document Object Model (DOM) ■ The principles of Scalable Vector Graphics (SVG) ■ Data-binding and selections with D3 ■ Different data types and their data visualization methods 4 CHAPTER 1 An introduction to D3.js 1.1 What is D3.js? D3.js was created to fill a pressing need for web-accessible, sophisticated data visualiza- tion. Because of the library’s robust design, it does more than make charts. And that’s a good thing, because data visualization no longer refers to pie charts and line graphs. It now means maps and interactive diagrams and other tools and content integrated into news stories, data dashboards, reports, and everything else you see on the web. D3.js’s creator, Mike Bostock, helped develop an earlier data visualization library, Pro- tovis, and also developed Polymaps, a JavaScript library that provides vector- and tile- mapping capability in a lightweight form. These earlier endeavors would inform the cre- ation of D3.js, which focuses on modern standards and modern browsers. As Bostock describes it, “This avoids proprietary representation and affords extraordinary flexibility, exposing the full capabilities of web standards such as CSS3, HTML5 and SVG” (http:// d3js.org/). This is the radical nature of D3.js. Although it won’t run on Internet Explorer 6, the widespread adoption of standards on modern browsers has finally allowed web developers to deliver dynamic and interactive content seamlessly in the browser. Until recently, you couldn’t build high-performance, rich internet applications in the browser unless you built them in Flash or as a Java applet. Flash and Java are still around on the internet, and especially for internal web apps, for this reason. D3.js provides the same performance, but integrated into web standards and the Document Object Model (DOM) at the core of HTML. D3 provides developers with the ability to create rich interactive and animated content based on data and tie that content to existing web page elements. It gives you the tools to create high-performance data dashboards and sophisticated data visualization, and to dynamically update traditional web content. But D3 isn’t easy for people to pick up, because they often expect it to be a simple charting library. A case in point is the pie chart layout, which you’ll see in chapter 5. D3 doesn’t have one single function to create a pie chart. Rather, it has a function that processes your dataset with the necessary angles so that, if you pass the dataset to D3’s arc function, you get the drawing code necessary to represent those angles. And you need to use yet another function to create the paths necessary for that code. It’s a much longer process than using dedicated charting libraries, but the D3 process is also its strength. Although other charting libraries conveniently allow you to make line graphs and pie charts, they quickly break down when you want to make some- thing more than that. Not D3, which allows you to build whatever data-driven graphics and interactivity you can imagine, and that’s why D3 is behind much of the most inno- vative and exciting information visualization on the web today. 1.2 How D3 works Let’s take a look at the principles of data visualization, as well as how D3 works in gen- eral. In figure 1.1 you see a rough map of how you might start with data and use D3 to process and represent that data, as well as add interactivity and optimize the data visu- alization you’ve created. In this chapter we’ll start by establishing the principles of how D3 selections and data-binding work and learning how D3 interacts with SVG and 5How D3 works the DOM. Then we’ll look at data types that you’ll commonly encounter. Finally, we’ll use D3 to create simple DOM and SVG elements. 1.2.1 Data visualization is more than data visualization You may think of data visualization as limited to pie charts, line charts, and the variety of charting methods popularized by Tufte and deployed in research. It’s much more than that. One of the core strengths of D3.js is that it allows for the creation of vector Data? 01101011 00011101 11011010 01010101 10110101 10101111 Structured data? Process data (chapter 2) Basic charting (chapters 2– 4) Mouse events (chapters 2–12) Advanced layouts (chapter 5) Interactivity (chapter 2) HTML (chapters 3 and 8) Bind data (chapter 2) Load data? (chapters 2 and 3) Generate a dataset (chapter 11) Network visualization (chapter 6) Maps (chapter 7) Zoom (chapters 5 and 7) Mobile (chapter 12) Brush filtering (chapters 9 and 11) Optimization (chapter 11) Data dashboard (chapter 9) Figure 1.1 A map of how to approach data visualization with D3.js that highlights the approach in this book. Start at the top with data, and then follow the path depending on the type of data and the needs you’re addressing. 6 CHAPTER 1 An introduction to D3.js graphics for traditional charting, but also the creation of geospatial and network visu- alizations, as well as traditional HTML elements like tables, lists, and paragraphs. This broad-based approach to data visualization, where a map or a network graph or a table is just another kind of representation of data, is the core of the D3.js library’s appeal for application development. Figures 1.2 through 1.8 show data visualization pieces that I’ve created with D3. They include maps and networks, along with more traditional pie charts and com- pletely custom data visualization layouts based on the specific needs of my clients. Figure 1.2 D3 can be used for simple charts, such as this set of multiple pie charts (explained in chapter 5) used to represent the differences in the use of language about nature in major US city planning (from the City Nature project at citynature.stanford.edu). Each pie shows the ratio of language referring to parks and open space (green) versus habitat (red) in city plans. Figure 1.3 D3 can also be used to create web maps (see chapter 7), such as this map showing the ethnic makeup of major metropolitan areas in the United States. 7How D3 works Figure 1.4 Maps in D3 aren’t limited to traditional Mercator web maps, and can be interactive globes, like this map of undersea communication cables, or other more unorthodox maps (see chapter 7). Figure 1.5 D3 also provides robust capacities to create interactive network visualizations (see chapter 6). Here you see the social and coauthorship network of archaeologists working at the same dig for nearly 25 years. 8 CHAPTER 1 An introduction to D3.js Figure 1.6 D3 includes a library of common data visualization layouts, such as the dendrogram (explained in chapter 5), that let you represent data such as this word tree. Figure 1.7 D3 has numerous SVG drawing functions (see chapter 4) so you can create your own custom visualizations, such as this representation of musical scores. 9How D3 works Although the ability to create rich and varied graphics is one of D3’s strong points, more important for modern web development is the ability to embed the high level of interactivity that users expect. With D3, every element of every chart, from a spinning globe to a single, thin slice of a pie chart, is made interactive in the same way. And because D3 was written by someone well versed in data visualization practice, it includes a number of interactive components and behaviors that are standard in data visualiza- tion and web development. You don’t invest your time learning D3 so that you can deploy Excel-style charts on the web. For that, there are easier, more convenient libraries. You learn D3 because it gives you the ability to implement almost every major data visualization technique. It also gives you the power to create your own data visualization techniques, something a more general library can’t do. For more examples of the variety of different data visualization techniques real- ized with D3, take a look at Christophe Viau’s gallery of over 2,000 D3 examples here: http://christopheviau.com/d3list/gallery.html. By requiring a break with the practice of supporting long-obsolete browsers, D3.js affords developers the capacity to make not only richly interactive applications but also applications that are styled and served like traditional web content. This makes them more portable, more amenable to the growing, linked data web, and more easily maintained by large teams. The decision on Bostock’s part to deal broadly with data, and to create a library capa- ble of presenting maps as easily as charts, as easily as networks, as easily as ordered lists, also means that a developer doesn’t need to try to understand the abstractions and syntax of one library for maps, and another for dynamic text content, and another for data visualization. Instead, the code for running an interactive, force-directed network Figure 1.8 You can combine these layouts and functions to create a data dashboard like we’ll do in chapter 9. You can also use the drawing functions to make your bar charts look distinctive, such as this “sketchy” style. 10 CHAPTER 1 An introduction to D3.js layout is very close to pure JavaScript and also similar to the code representing dynamic points of interest (POIs) on a D3.js map. Not only are the methods the same, but the very data could be the same, formulated in one way for lists and paragraphs and spans, while formulated in another way for geospatial representation. The class of data-driven documents is already broad and becomes even more all-encompassing when you also treat images and text as data. 1.2.2 D3 is about selecting and binding Throughout this chapter, you’ll see code snippets that you can run in your browser to make changes to the graphical appearance of elements on your website. At the end of the chapter is an application written in D3 that explains the basics of the code we’re running in JavaScript. But before that we’ll explore the principles of web develop- ment using D3, and you’ll see this pattern of code over and over again: selecting. Imagine we have a set of data, such as the price and size of a few houses, and a set of web page elements, whether graphics or traditional
elements, and that we want to represent the dataset, whether with text or through size and color. A selection is the group of all of them together, and we perform actions on the elements in the group, such as moving them, changing their color, or updating the values in the data. We work with the data and the web page elements separately, but the real power of D3 comes from using selections to combine data and web page elements. Here’s a selection without any data: d3.selectAll("circle.a").style("fill", "red").attr("cx", 100); This takes every circle on our page with the class of "a" and turns it red and moves it so that its center is 100 pixels to the right of the left side of our canvas. Likewise, this code turns every div on our web page red and changes its class to "b": d3.selectAll("div").style("background", "red").attr("class", "b"); But before we can change our circles and divs, we’ll need to create them, and before we do that, it’s best to understand what’s happening in this pattern. The first part of that line of code, d3.selectAll(), is part of the core functionality necessary for understanding D3: selections. Selections can be made with d3.select(), which selects the first single element found, but more often you’ll use d3.select- All(), which can be used to select multiple elements. Selections are a group of one or more web page elements that may be associated with a set of data, like the following code, which binds the elements in the array [1,5,11,3] to
elements with the class of "market": d3.selectAll("div.market").data([1,5,11,3]) This association is known in D3 as binding data, and you can think of a selection as a set of web page elements and a corresponding, associated set of data. Sometimes there are more data elements than DOM elements, or vice versa, in which case D3 has 11How D3 works functions designed to create or remove elements that you can use to generate con- tent. We’ll cover selections and data-binding in detail in chapter 2. Selections might not include any data-binding, and won’t for most of the examples in this chapter, but the inclusion allows the powerful information visualization techniques of D3. You can make a selection on any elements in a web page, including items in a list, circles, or even regions on a map of Africa. Just as the elements can take a number of shapes, the data associated with those elements (where applicable) can take many forms. 1.2.3 D3 is about deriving the appearance of web page elements from bound data After you have a selection, you can then use D3 to modify the appearance of web page elements to reflect differences in the data. You may want to make the length of a line equal to the value of the data, or change the color to a particular color that corre- sponds to a class of data. You may want to hide or show elements as they correspond to a user’s navigation of a dataset. As you can see in figure 1.9, after the page has loaded, you use D3 to select elements and bind data for the purpose of creating, removing, or chang- ing DOM elements. You continue to use this process in response to user interaction. You modify the appearance of elements by using selections to reference the data bound to an element in a selection. D3 iterates through the elements in your selection and performs the same action using the bound data, which results in different graphical Load web page 1 2 User interaction Select elements 3 Bind data 4 Create/update/remove elements 5 Figure 1.9 A page utilizing D3 is typically built in such a way that the page loads with styles, data, and content as defined in traditional HTML development B with its initial display using D3 selections of HTML elements c, either with data-binding d or without it, to modify the structure and appearance of the page e. The changes in structure prompt user interaction f, which causes new selections with and without data- binding to further alter the page. Step 1 is shown differently because it only happens once (when you load the page), whereas every other step may happen multiple times, depending on user interaction. 12 CHAPTER 1 An introduction to D3.js effects. Although the action you perform is the same, the effect is different because it’s based on the variation in the data. You’ll see data-binding first at the end of this chapter, and in much more detail throughout this book. 1.2.4 Web page elements can now be divs, countries, and flowcharts We’ve grown accustomed to thinking of web pages as consisting of text elements with containers for pictures, videos, or embedded applications. But as you grow more famil- iar with D3, you’ll begin to recognize that every element on the page can be treated with the same high-level abstractions. The most basic element on a web page, a
that represents a rectangle into which you can drop paragraphs, lists, and tables, can be selected and modified in the same way you can select and modify a country on a web map, or individual circles and lines that make up a complex data visualization. To be able to select items on a web page, you have to ensure that they’re built in a manner that makes them a part of the traditional structure of a web page. You can’t select items in a Java applet, or in a Flash runtime, nor can you select the labels on an embedded Google map, but if you create these elements so that they exist as elements in your web page, then you give yourself tremendous flexibility. To get a taste of this, look at chapter 7, where we’ll build robust mapping applications in D3, and we’ll use the d3.select() syntax to update the appearance of a mapping application in the same manner as it’s being used here and elsewhere to create and move circles or
elements. 1.3 Using HTML5 We’ve come a long way from the days when animated GIFs and frames were the pinna- cle of dynamic content on the web. In figure 1.10, you can see why GIFs never caught on for robust data visualization on the web. GIFs, like the infoviz libraries designed to use VML, are still necessary for earlier browsers, but D3 is designed for modern brows- ers that don’t need the helper libraries necessary for backward compatibility. D3 devel- opment isn’t for everyone, but if your audience can be assumed to have access to a modern web browser, D3 also brings a significant reduction in the cost necessary not only to code for older browsers but also to learn and keep updated on the various libraries that support backward compatibility with those older browsers. A modern browser typically can not only display SVG graphics and obey CSS3 rules, but also has great performance. Along with Cascading Style Sheets (CSS) and Scalable Vector Graphics (SVG), we can break down HTML5 into the DOM and JavaScript. The following sections treat each of them and include code you can run to see how D3 uses their functionality to create interactive and dynamic web content. 1.3.1 The DOM A web page is structured according to the DOM. You need a passing familiarity with the DOM to do web development, so we’ll take a quick look at DOM elements and structure in a simple web page in your browser and touch on the basics of the DOM. To get started, you’ll need a web server that you can access from the computer that 13Using HTML5 you’re using to code. With that in place, you can download the D3 library from d3js.org (d3.js or d3.min.js for the minified version) and place that in the directory where you’ll make your web page. You’ll create a page called d3ia.html in the text edi- tor with the following contents.
Basic HTML like this follows the DOM. It defines a set of nested elements, starting with an element with all its child elements and their child elements and so on. In this example, the Or you can use the minified script, which shouldn’t have any UTF-8 characters in it: 15Using HTML5 NOTE You’ll see the console in this first chapter, but in chapter 2, once you’re familiar with it, I’ll show only the output. The element inspector allows you to look at the elements that make up your web page by navigating through the DOM (represented as nested text, where each child ele- ment is shown indented). You can also select an element onscreen graphically, typi- cally represented as a magnifying glass or cursor icon. Figure 1.11 The developer tools in Chrome place the JavaScript console on the rightmost tab, labeled “Console,” with the element inspector available using the hourglass on the bottom left or by browsing the DOM in the Elements tab. Figure 1.12 You can run JavaScript code in the console and also call global variables or declare new ones as necessary. Any code you write in the console and changes made to the web page are lost as soon as you reload the page. 16 CHAPTER 1 An introduction to D3.js The other screen you’ll want to use quite often is the console (figure 1.12), which allows you to write and run JavaScript code right on your web page. The examples in this book use Google Chrome and its developer console, but you could use Safari’s developer tools or Firebug in Firefox, or whatever developer con- sole you’re most comfortable with. You can see and manipulate DOM elements such as
or by clicking on the element inspector or looking at the DOM as repre- sented in HTML. You can click one of these elements and change its appearance by modifying it in the console. You can even delete elements in the console. Give it a try: select the div either in the DOM or visually, and press Delete. Now your web page is very lonely. Press Refresh so that your page reloads the HTML and your div comes back. You can adjust the size and color of your div by adding new styles or changing the existing one, so you can increase the width of the border and make it dashed by changing the border style to Black 5px Dashed. You can add content to the div in the form of other elements, or you can add text by right-clicking on the element and selecting Edit as HTML, as shown in figures 1.13 and 1.14. You can then write whatever you’d like in between the opening and closing HTML. Any changes you make, regardless of whether they’re well structured or not, will be reflected on the web page. In figure 1.15 you see the results of modifying the HTML, which is rendered immediately on your page. In this way, you could slowly and painstakingly create a web page in the console. We’re not going to do that. Instead, we’ll use D3 to create elements on the fly with size, position, shape, and content based on our data. Figure 1.13 Rather than adding or modifying individual styles and attributes, you have the ability to rewrite the HTML code as you would in a text editor. As with any changes, these only last until you reload the page. 17Using HTML5 Figure 1.14 Changing the content of a DOM element is as simple as adding text between the opening and ending brackets of the element. Figure 1.15 The page is updated as soon as you finish making your changes. Writing HTML manually in this way is only useful for planning how you might want to dynamically update the content. 18 CHAPTER 1 An introduction to D3.js 1.3.2 Coding in the console You’ll do a lot of your coding in the IDE of your choice, but one of the great things about web development is that you can test JavaScript code changes by using your console. Later you’ll focus on writing JavaScript, but for now, to demonstrate how the console works, copy the following code into your console and press Enter: d3.select("div").style("background","lightblue").style("border", "solid black 1px").html("Something else maybe"); You should see the effect shown in figure 1.16. You’ll see a few more uses of traditional HTML elements in this chapter, and then again in chapter 3, but then you won’t see traditional DOM elements again until chap- ter 8, where we’ll use D3 to create complex, data-driven spreadsheets and galleries using
, , and
Statistics
Team Name
Region
Wins
Losses
Draws
Points
Goals For
Goals Against
Clean Sheets
Yellow Cards
Red Cards
And now we’ll add CSS rules for the table and the div that we want to put it in. As you see in the following listing, we can use the position and z-index CSS styles because this is a traditional DOM element. #modal { position:fixed; left:150px; top:20px; z-index:1; background: white; border: 1px black solid; box-shadow: 10px 10px 5px #888888; } tr { border: 1px gray solid; } td { font-size: 10px; } td.data { font-weight: 900; } Listing 3.4 modal.html Listing 3.5 Update to d3ia.css 97Pregenerated content Now that we have the table, all we need to do is add a click listener and associated function to populate this dialog, as well as a function to create a div with ID "modal" into which we add the loaded HTML code using the .html() function: d3.text("resources/modal.html", function(data) { d3.select("body").append("div").attr("id", "modal").html(data); }); teamG.on("click", teamClick); function teamClick(d) { d3.selectAll("td.data").data(d3.values(d)) .html(function(p) { return p }); }; The results are immediately apparent when you reload the page. A div with the defined table in modal.html is created, and when you click it, it populates the div with values from the data bound to the element you click (figure 3.19). We used d3.text() in this case because when working with HTML, it can be more convenient to load the raw HTML code like this and drop it into the .html() function of a selected element that you’ve created. If you use d3.html(), then you get HTML nodes that allow you to do more sophisticated manipulation, which you’ll see now as we work with pregenerated SVG. Creates a new div with an id corresponding to one in our CSS, and populates it with HTML content from modal.html Selects and updates the td.data elements with the values of the team clicked Figure 3.19 The modal dialog is styled based on the defined style in CSS. It’s created by loading the HTML data from modal.html and adding it to the content of a newly created div. 98 CHAPTER 3 Data-driven design and interaction 3.3.3 Pregenerated SVG SVG has been around for a while, and there are, not surprisingly, robust tools for draw- ing SVG, like Adobe Illustrator and the open source tool Inkscape. You’ll likely want pregenerated SVG for icons, interface elements, and other components of your work. If you’re interested in icons, The Noun Project (http://thenounproject.com/) has an extensive repository of SVG icons, including the football in figure 3.20. When you download an icon from The Noun Project, you get it in two forms: SVG and PNG. You’ve already learned how to reference images, and you can do the same with SVG by pointing the xlink:href attribute of an element at an SVG file. But loading SVG directly into the DOM gives you the capacity to manipulate it like any SVG elements that you create in the browser with D3. Let’s say we decide to replace our boring circles with balls, and we don’t want them to be static images because we want to be able to modify their color and shape like other SVG. In that case, we’ll need to find a suitable ball icon and download it. In the case of downloads from The Noun Project, this means we’ll need to go through the hassle of creating an account, and we’ll need to properly attribute the creator of the icon or pay a fee to use the icon without attribution. Regardless of where we get our icon, we might need to modify it before using it in our data visualization. In the case of the football icon in this example, we need to make it smaller and center the icon on the 0,0 point of the canvas. This kind of preparation is going to be different for every icon, depending on how it was originally drawn and saved. Figure 3.20 An icon for a football created by James Zamyslianskyj and available at http://thenounproject.com/term/football/1907/ from The Noun Project 99Pregenerated content With the modal table we used earlier, we assumed that we pulled in all the code found in modal.html, and so we could bring it in using d3.text() and drop the raw HTML as text into the .html() function of a selection. But in the case of SVG, especially SVG that you’ve downloaded, you often want to ignore the verbose settings in the docu- ment, which will include its own canvas as well as any elements that have been not-so-helpfully added. You probably want to deal only with the graphical ele- ments. With our soccer ball, we want to get only the elements. If we load the file using d3.html(), then the results are DOM nodes loaded into a document frag- ment that we can access and move around using D3 selection syntax. Using d3.html() is the same as using any of the other loading functions, where you designate the file to be loaded and the callback. You can see the results of this command in figure 3.21: d3.html("resources/icon_1907.svg", function(data) {console.log(data);}); After we load the SVG into the fragment, we can loop through the fragment to get all the paths easily using the .empty() function of a selection. The .empty() function checks to see if a selection still has any elements inside it and eventually fires true after we’ve moved the paths out of the fragment into our main SVG. By including .empty() in a while statement, we can move all the path elements out of the docu- ment fragment and load them directly onto the SVG canvas. d3.html("resources/icon_1907.svg", loadSVG); function loadSVG(svgData) { while(!d3.select(svgData).selectAll("path").empty()) { What we don’t want What we want Figure 3.21 An SVG loaded using d3.html() that was created in Inkscape. It consists not only of the graphical elements that make up the SVG but also much data that’s often extraneous. The data variable will automatically be passed to loadSVG(). 100 CHAPTER 3 Data-driven design and interaction d3.select("svg").node().appendChild( d3.select(svgData).select("path").node()); } d3.selectAll("path").attr("transform", "translate(50,50)"); }; Notice how we’ve added a transform attribute to offset the paths so that they won’t be clipped in the top-right corner. Instead, you clearly see a football in the top corner of your canvas. Document fragments aren’t a normal part of your DOM, so you don’t have to worry about accidentally selecting the canvas in the document fragment, or any other elements. A while loop like this is sometimes necessary, but typically the best and most effi- cient method is to use .each() with your selection. Remember, .each() runs the same code on every element of a selection. In this case, we want to select our canvas and append the path to that canvas. function loadSVG(svgData) { d3.select(svgData).selectAll("path").each(function() { d3.select("svg").node().appendChild(this); }); d3.selectAll("path").attr("transform", "translate(50,50)"); }; We end up with a football floating in the top-left corner of our canvas, as shown in fig- ure 3.22. Figure 3.22 A hand-drawn football icon is loaded onto the canvas, along with the other SVG and HTML elements we created in our code. 101Pregenerated content Loading elements from external data sources like this is useful if you want to move individual nodes out of your loaded document fragment, but if you want to bind the externally loaded SVG elements to data, it’s an added step that you can skip. We can’t set the .html() of a element to the text of our incoming elements like we did with the
when we populated it with the contents of modal.html. That’s because SVG doesn’t have a corresponding property to innerHTML, and therefore the .html() function on a selection of SVG elements has no effect. Instead, we have to clone the paths and append them to each element representing our teams: d3.html("resources/icon_1907.svg", loadSVG); function loadSVG(svgData) { d3.selectAll("g").each(function() { var gParent = this; d3.select(svgData).selectAll("path").each(function() { gParent.appendChild(this.cloneNode(true)) }); }); }; It may seem backwards to select each and then select each loaded , until you think about how .cloneNode() and .appendChild() work. We need to take each element and go through the -cloning process for every path in the loaded icon, which means we use nested .each() statements (one for each element in our DOM and one for each element in the icon). By setting gParent to the actual node (the this variable), we can then append a cloned version of each path in order. The results are soccer balls for each team, as shown in figure 3.23. We can easily do the same thing using the syntax from the first example in this section, but with our SVG elements individually added to each. And now we can style them in the same way as any path element. We could use the national col- ors for each ball, but we’ll settle for making them red, with the results shown in fig- ure 3.24. d3.selectAll("path").style("fill", "darkred") .style("stroke", "black").style("stroke-width", "1px"); Figure 3.23 Each element has its own set of paths cloned as child nodes, resulting in football icons overlaid on each element. Figure 3.24 Football icons with a fill and stroke set by D3 102 CHAPTER 3 Data-driven design and interaction One drawback with this method is that the paths can’t take advantage of the D3 .insert() method’s ability to place the elements behind the labels or other visual ele- ments. To get around this, we’ll need to either append icons to elements that have been placed in the proper order, or use the parentNode and appendChild functions to move the paths around the DOM like we described earlier in this chapter. The other drawback is that because these paths were added using cloneNode and not selection#append syntax, they have no data bound to them. We looked at rebind- ing data back in chapter 1. If we select the elements and then select the element, this will rebind data. But we have numerous elements under each element, and selectAll doesn’t rebind data. As a result, we have to take a more involved approach to bind the data from the parent elements to the child elements that have been loaded in this manner. The first thing we do is select all the elements and then use .each() to select all the path elements under each . Then, we separately bind the data from the to each using .datum(). What’s .datum()? Well, datum is the singular of data, so a piece of data is a datum. The datum function is what you use when you’re binding just one piece of data to an element. It’s the equivalent of wrapping your variable in an array and binding it to .data(). After we perform this action, we can dust off our old scale from earlier and apply it to our new elements. We can run this code in the console to see the effects, which should look like figure 3.25. d3.selectAll("g.overallG").each(function(d) { d3.select(this).selectAll("path").datum(d) }); var tenColorScale = d3.scale .category10(["UEFA", "CONMEBOL", "CAF", "AFC"]); d3.selectAll("path").style("fill", function(p) { return tenColorScale(p.region) }).style("stroke", "black").style("stroke-width", "2px"); Now you have data-driven icons. Use them wisely. 3.4 Summary Throughout this chapter, we dealt with methods and functionality that typically are glossed over in D3 tutorials, such as the color functions and loading external content like external SVG and HTML. We also saw common D3 functionality, like animated transitions tied to mouse events. Specifically, we covered Figure 3.25 The paths now have the data from their parent element bound to them and respond accordingly when a discrete color scale based on region is applied. 103Summary ■ Planning project file structure and placing your D3 code in the context of tradi- tional web development ■ External libraries you want to be aware of for D3 applications ■ Using transitions and animation to highlight change and interaction ■ Creating event listeners for mouse events on buttons and graphical elements ■ Using color effectively for categories and numerical data, and being aware of how color is treated in interpolations ■ Accessing the DOM element itself from a selection ■ Loading external resources, specifically images, HTML fragments, and pregen- erated SVG D3 is a powerful library that can handle much of the needs of an interactive site, but you need to know when to rely on core HTML5 functionality or other libraries when that would be more efficient. Moving forward, we’ll transition from the core functions of D3 and get into the higher-level features of the library that allow you to build fully functional charts and chart components. We’ll start in the next chapter by looking at generating SVG lines and areas from data as well as preformatted axis components for your charts. We’ll also go into more detail about creating complex multipart graphical objects from your data and use those techniques to produce complex examples of information visualization. Part 2 The pillars of information visualization The next five chapters provide an exhaustive look into the layouts, compo- nents, behaviors, and controls that D3 provides to create the varieties of data visualization you’ve seen all over the web. In chapter 4 you’ll learn how to create line and area charts, deploying D3 axes to make them readable, as well as how to build complex multipart boxplots that encode several different data variables at the same time. Chapter 5 walks through seven different D3 layouts, from the simple pie chart to the exotic Sankey diagram, and shows you how to implement each layout in a few different ways. Chapter 6 focuses entirely on representing network structures, showing you how to visualize them using arc diagrams, adjacency matrices, and force-directed layouts, and introduces several new tech- niques like SVG markers. Chapter 7 also focuses on a single domain, this time geospatial data, and demonstrates how to leverage D3’s incredible geospatial functionality to build different kinds of maps. Chapter 8 shifts to creating more traditional DOM elements using D3 data-binding that result in a spreadsheet and simple image gallery. Whether you’re interested in all of these areas or diving deeply into just one, part 2 provides you with the tools to represent any kind of data using advanced data visualization not available in standard charting librar- ies and applications. 107 Chart components D3 provides an enormous library of examples of charts, and GitHub is also packed with implementations. It’s easy to format your data to match the existing data used in an implementation and, voilà, you have a chart. Likewise, D3 includes layouts that allow you to create complex data visualizations from a properly formatted data- set. But before you get started with default layouts—which allow you to create basic charts like pie charts, as well as more exotic charts—you should first understand the basics of creating the elements that typically make up a chart and in the pro- cess produce charts like those seen in figure 4.1. This chapter focuses on widely used pieces of charts created with D3, such as a labeled axis or a line. It also touches on the formatting, data modeling, and analytical methods most closely tied to creating charts. Obviously, this isn’t your first exposure to charts, because you created a scatter- plot and bar chart in chapter 2. This chapter introduces you to components and This chapter covers ■ Creating and formatting axis components ■ Using line and area generators for charts ■ Creating complex shapes consisting of multiple types of SVG elements 108 CHAPTER 4 Chart components generators. A D3 component, like an axis, is a function for drawing all the graphical elements necessary for an axis. A generator, like d3.svg.line(), lets you draw a straight or curved line across many points. The chapter begins by showing you how to add axes to scatterplots as well as create line charts, but before the end you’ll create an exotic yet simple chart: the streamgraph. By understanding how D3 generators and components work, you’ll be able do more than re-create the charts that other people have made and posted online (many of which they’re just re-creating from somewhere else). A chart (and notice here that I don’t use the term graph because that’s a synonym for network) refers to any flat layout of data in a graphical manner. The datapoints, which can be individual values or objects in arrays, may contain categorical, quan- titative, topological, or unstructured data. In this chapter we’ll use several datasets to create the charts shown in figure 4.1. Although it may seem more useful to use a sin- gle dataset for the various charts, as the old saying goes, “Horses for courses,” which is to say that different charts are more suitable to different kinds of datasets, as you’ll see in this chapter. 4.1 General charting principles All charts consist of several graphical elements that are drawn or derived from the dataset being represented. These graphical elements may be graphical primitives, like circles or rectangles, or more-complex, multipart, graphical objects like the boxplots we’ll look at later in the chapter. Or they may be supplemental pieces like axes and labels. Although you use the same general processes you explored in previous chap- ters to create any of these elements in D3, it’s important to differentiate between the methods available in D3 to create graphics for charts. You’ve learned how to directly create simple and complex elements with data-bind- ing. You’ve also learned how to measure your data and transform it for display. Along with these two types of functions, D3 functionality can be placed into three broader categories: generators, components, and layouts, which are shown in figure 4.2 along with a general overview of how they’re used. Figure 4.1 The charts we’ll create in this chapter using D3 generators and components. From left to right: a line chart, a boxplot, and a streamgraph. 109General charting principles 4.1.1 Generators D3 generators consist of functions that take data and return the necessary SVG drawing code to create a graphical object based on that data. For instance, if you have an array of points and you want to draw a line from one point to another, or turn it into a polygon or an area, a few D3 functions can help you with this process. These generators simplify the process of creating a complex SVG by abstracting the process needed to write a d attribute. In this chapter, we’ll look at d3.svg.line and d3.svg.area, and in the next chapter you’ll see d3.svg.arc, which is used to create the pie pieces of pie charts. Another generator that you’ll see in chapter 5 is d3.svg.diagonal, used for drawing curved connecting lines in dendrograms. 4.1.2 Components In contrast with generators, which produce the d attribute string necessary for a element, components create an entire set of graphical objects necessary for a particular chart component. The most commonly used D3 component (which you’ll see in this chapter) is d3.svg.axis, which creates a bunch of , , , and elements that are needed for an axis based on the scale and settings you provide the function. Another component is d3.svg.brush (which you’ll see later), which creates all the graphical elements necessary for a brush selector. 4.1.3 Layouts In contrast to generators and components, D3 layouts can be rather straightforward, like the pie chart layout, or complex, like a force-directed network layout. Layouts What they take Datapoint Array values Type and examples Generators SVG drawing code for the d attribute of elements: "M-23,-13,24 0 0,1 -21,-11L-17, -91A200,200 0 0,0 -19,-11Z" area(), line(), diagonal(), arc()… What they produce Functions scale()… Components Elements and event listeners " ele- ment where we want these graphical elements to be drawn. var yAxis = d3.svg.axis().scale(yScale).orient("right"); d3.select("svg").append("g").attr("id", "yAxisG").call(yAxis); var xAxis = d3.svg.axis().scale(xScale).orient("bottom"); d3.select("svg").append("g").attr("id", "xAxisG").call(xAxis); Notice that the .call() method of a selection invokes a function with the selection that’s active in the method chain, and is the equivalent of writing xAxis(d3.select("svg").append("g").attr("id", "xAxisG")); Figure 4.5 shows a result that’s more legible, with the xy positions of the circles denoted by labels in a pair of axes. The labels are derived from the scales that we used to create each axis, and provide the context necessary to interpret this chart. The axis lines are thick enough to overlap with one of our scatterplot points because the domain of the axis being drawn is a path. Recall from chapter 3 that paths are by default filled in black. We can adjust the display by setting the fill style of those two axis domain paths to "none". Doing so reveals that the ticks for the axes aren’t being drawn, because those elements don’t have default “stroke” styles applied. Figure 4.6 demonstrates why we don’t see any of our ticks and why we have thick black regions for our axis domains. To improve our axes, we need to style them properly. 4.2.2 Styling axes These elements are standard SVG elements created by the axis function, and they don’t have any more or less formatting than any other elements would when first created. Figure 4.5 The same scatterplot from figure 4.4, but with a pair of labeled axes. The x-axis is drawn in such a way as to obscure one of the points. 113Creating an axis This may seem counterintuitive, but SVG is meant to be paired with CSS, so it’s better that elements don’t have any “helpful” styles assigned to them, or you’d have a hard time overwriting those styles with your CSS. For now, we can set the domain path to fill:none and the lines to stroke: black using d3.select() and .style() to see what we’re missing, as shown in figure 4.7. 1 2 3 Figure 4.6 Elements of an axis created from d3.svg.axis are B a with a size equal to the extent of the axis, c a that contains a and a for each major tick, and d a for each minor tick (this will only be the case when using the deprecated tickSubdivide function in D3 version 3.2 and earlier). Not shown, and invisible, is the element that’s called and in which these elements are created. In our example, region 1 is filled with black and none of the lines have strokes, because that’s the default way that SVG draws and elements. Figure 4.7 If we change the fill value to "none" and set its and the stroke values to "black", we see the ticks and the stroke of . It also reveals our hidden datapoint. 114 CHAPTER 4 Chart components d3.selectAll("path.domain").style("fill", "none").style("stroke", "black"); d3.selectAll("line").style("stroke", "black"); If we set the .orient() option of the y-axis to "left" or the .orient() option of the x-axis to "top", is seems like they aren’t drawn. This is because they’re drawn outside the canvas, like our earlier rectangles. To move our axes around, we need to adjust the .attr("translate") of their parent elements, either when we draw them or later. This is why it’s important to assign an ID to our elements when we append them to the canvas. We can move the x-axis to the bottom of this drawing easily: d3.selectAll("#xAxisG").attr("transform","translate(0,500)"); Here’s our updated code. It uses the .tickSize() function to change the ticks to lines and manually sets the number of ticks using the ticks() function: var scatterData = [{friends: 5, salary: 22000}, {friends: 3, salary: 18000}, {friends: 10, salary: 88000}, {friends: 0, salary: 180000}, {friends: 27, salary: 56000}, {friends: 8, salary: 74000}]; var xScale = d3.scale.linear().domain([0,180000]).range([0,500]); var yScale = d3.scale.linear().domain([0,27]).range([0,500]); xAxis = d3.svg.axis().scale(xScale) .orient("bottom").tickSize(500).ticks(4); d3.select("svg").append("g").attr("id", "xAxisG").call(xAxis); yAxis = d3.svg.axis().scale(yScale) .orient("right").ticks(16).tickSize(500); d3.select("svg").append("g").attr("id", "yAxisG").call(yAxis); d3.select("svg").selectAll("circle") .data(scatterData).enter() .append("circle").attr("r", 5) .attr("cx", function(d) {return xScale(d.salary);}) .attr("cy", function(d) {return yScale(d.friends);}); The effect all these functions is uninspiring, as shown in figure 4.8. Let’s examine the elements created by the axis code and shown in figure 4.8 as a giant black square. The element that we created with the ID of "xAxisG" contains elements that each have a line and text: 0 We use selectAll because there are two of these paths, one for each axis we called. We’ll want to be more specific in the future ("line.tick"), because it’s likely that whatever we’re working on will have more lines than those used in our axes. Creates a pair of scales to map the values in our dataset to the canvas Uses method chaining to create an axis and explicitly set its orientation, tick size, and number of ticks Appends a element to the canvas, and calls the axis from that to create the necessary graphics for the axis 115Creating an axis Notice that the element has been created with classes, so we can style the child ele- ments (our line and our label) using CSS, or select them with D3. This is necessary if we want our axes to be displayed properly, with lines corresponding to the labeled points. Why? Because along with lines and labels, the axis code has drawn the to cover the entire region contained by the axis elements. This domain element needs to be set to "fill: none", or we’ll end up with a big black square. You’ll also see examples where the tick lines are drawn with negative lengths to create a slightly different visual style. For our axis to make sense, we could continue to apply inline styles by using d3.select to modify the styles of the necessary ele- ments, but instead we should use CSS, because it’s easier to maintain and doesn’t require us to write styles on the fly in JavaScript. The following listing shows a short CSS style sheet that corresponds to the elements created by the axis function. Listing 4.1 ch4stylesheet.css Figure 4.8 Setting axis ticks to the size of your canvas also sets to the size of your canvas. Because paths are, by default, filled with black, the result is illegible. This applies to all our lines, which includes the major lines that we’d otherwise need to reference with "g.major > line". 116 CHAPTER 4 Chart components With this in place, we get something a bit more legible, as shown in figure 4.9. Take a look at the elements created by the axis() function in figure 4.9, and see in figure 4.10 how the CSS classes are associated with those elements. As you create more-complex information visualization, you’ll get used to creating your own elements with classes referenced by your style sheet. You’ll also learn where Figure 4.9 With fill set to "none" and CSS settings also corresponding to the tick elements, we can draw a rather attractive grid based on our two axes. Figure 4.10 The DOM shows how tick elements are appended along with a element for the label to one of a set of elements corresponding to the number of ticks. 117Complex graphical objects D3 components create elements in the DOM and how they’re classed so that you can style them properly. 4.3 Complex graphical objects Using circles or rectangles for your data won’t work with some datasets, for example, if an important aspect of your data has to do with distribution, like user demographics or statistical data. Often, the distribution of data gets lost in information visualiza- tion, or is only noted with a reference to standard deviation or other first-year statis- tics terms that indicate the average doesn’t tell the whole story. One particularly useful way of representing data that has a distribution (such as a fluctuating stock price) is the use of a boxplot in place of a traditional scatterplot. The boxplot uses a complex graphic that encodes distribution in its shape. The box in a boxplot typi- cally looks like the one shown in figure 4.11. It uses quartiles that have been prepro- cessed, but you could easily use d3.scale.quartile() to create your own values from your own dataset. Take a moment to examine the amount of data that’s encoded in the graphic in figure 4.11. The median value is represented as a gray line. The rectangle shows the amount of whatever you’re measuring that falls in a set range that represents the major- ity of the data. The two lines above and below the rectangle indicate the minimum and maximum values. Everything except the information in the gray line is lost when you map only the average or median value at a datapoint. To build a reasonable boxplot, we’ll need a set of data with interesting variation in those areas. Let’s assume we want to plot the number of registered visitors coming to our website by day of the week so that we can compare our stats week to week (or so that we can present this info to our boss, or for some other reason). We have the data Maximum value Minimum value Within first and third quartiles Median value Figure 4.11 A box from a boxplot consists of five pieces of information encoded in a single shape: (1) the maximum value, (2) the high value of some distribution, such as the third quartile, (3) the median or mean value, (4) the corresponding low value of the distribution, such as the first quartile, and (5) the minimum value. 118 CHAPTER 4 Chart components for the age of the visitors (based on their registration details) and derived the quartiles from that. Maybe we used Excel, Python, or d3.scale.quartile(), or maybe it was part of a dataset we downloaded. As you work with data, you’ll be exposed to common statistical summaries like this and you’ll have to represent them as part of your charts, so don’t be too intimidated by it. We’ll use a CSV format for the information. The following listing shows our dataset with the number of registered users that visit the site each day, and the quartiles of their ages. day,min,max,median,q1,q3,number 1,14,65,33,20,35,22 2,25,73,25,25,30,170 3,15,40,25,17,28,185 4,18,55,33,28,42,135 5,14,66,35,22,45,150 6,22,70,34,28,42,170 7,14,65,33,30,50,28 When we map the median age as a scatterplot, as in figure 4.12, it looks like there’s not too much variation in our user base throughout the week. We do that by drawing scatterplot points for each day at the median age of the visitor for that day. We’ll also invert the y-axis so that it makes a bit more sense. d3.csv("boxplot.csv", scatterplot) function scatterplot(data) { xScale = d3.scale.linear().domain([1,8]).range([20,470]); yScale = d3.scale.linear().domain([0,100]).range([480,20]); yAxis = d3.svg.axis() .scale(yScale) .orient("right") .ticks(8) .tickSize(-470); d3.select("svg").append("g") .attr("transform", "translate(470,0)") .attr("id", "yAxisG") .call(yAxis); xAxis = d3.svg.axis() .scale(xScale) .orient("bottom") .tickSize(-470) .tickValues([1,2,3,4,5,6,7]); d3.select("svg").append("g") .attr("transform", "translate(0,480)") .attr("id", "xAxisG") .call(xAxis); Listing 4.2 boxplots.csv Listing 4.3 Scatterplot of average age Scale is inverted, so higher values are drawn higher up and lower values toward the bottom Offsets the containing the axis Specifies the exact tick values to correspond with the numbered days of the week 119Complex graphical objects d3.select("svg").selectAll("circle.median") .data(data) .enter() .append("circle") .attr("class", "tweets") .attr("r", 5) .attr("cx", function(d) {return xScale(d.day)}) .attr("cy", function(d) {return yScale(d.median)}) .style("fill", "darkgray"); } But to get a better view of this data, we’ll need to create a boxplot. Building a box- plot is similar to building a scatterplot, but instead of appending circles for each point of data, you append a element. It’s a good rule to always use elements for your charts, because they allow you to apply labels or other important information to your graphical representations. But that means you’ll need to use the transform attribute, which is how elements are positioned on the canvas. Elements appended to a base their coordinates off of the coordinates of their parent. When applying x and y attributes to child elements, you need to set them relative to the parent . Rather than selecting all the elements and appending child elements one at a time, as we did in earlier chapters, we’ll use the .each() function of a selection, which allows us to perform the same code on each element in a selection, to create the new elements. Like any D3 selection function, .each() allows you to access the bound data, array position, and DOM element. Earlier on, in chapter 1, we achieved the same func- tionality by using selectAll to select the elements and directly append Figure 4.12 The median age of visitors (y-axis) by day of the week (x-axis) as represented by a scatterplot. It shows a slight dip in age on the second and third days. 120 CHAPTER 4 Chart components and elements. That’s a clean method, and the only reasons to use .each() to add child elements are if you prefer the syntax, you plan on doing complex operations involving each data element, or you want to add conditional tests to change whether or what child elements you’re appending. You can see how to use .each() to add child ele- ments in action in the following listing, which takes advantage of the scales we created in listing 4.3 and draws rectangles on top of the circles we’ve already drawn. d3.select("svg").selectAll("g.box") .data(data).enter() .append("g") .attr("class", "box") .attr("transform", function(d) { return "translate(" + xScale(d.day) +"," + yScale(d.median) + ")"; }).each(function(d,i) { d3.select(this) .append("rect") .attr("width", 20) .attr("height", yScale(d.q1) - yScale(d.q3)); }); The new rectangles indicating the distribution of visitor ages, as shown in figure 4.13, are not only offset to the right, but also showing the wrong values. Day 7, for instance, should range in value from 30 to 50, but instead is shown as ranging from 13 to 32. We know it’s doing that because that’s the way SVG draws rectangles. We have to update our code a bit to make it accurately reflect the distribution of visitor ages: Listing 4.4 Initial boxplot drawing code Because we’re inside the .each(), we can select(this) to append new child elements. The d and i variables are declared in the .each() anonymous function, so each time we access it, we get the data bound to the original element. Figure 4.13 The elements represent the scaled range of the first and third quartiles of visitor age. They're placed on top of a gray in each element, which is placed on the chart at the median age. The rectangles are drawn, as per SVG convention, from the down and to the right. 121Complex graphical objects … .each(function(d,i) { d3.select(this) .append("rect") .attr("width", 20) .attr("x", -10) .attr("y", yScale(d.q3) - yScale(d.median)) .attr("height", yScale(d.q1) - yScale(d.q3)) .style("fill", "white") .style("stroke", "black"); }); We’ll use the same technique we used to create the chart in figure 4.14 to add the remaining elements of the boxplot (described in detail in figure 4.15) by including several append functions in the .each() function. They all select the parent ele- ment created during the data-binding process and append the shapes necessary to build a boxplot. … .each(function(d,i) { d3.select(this) .append("line") .attr("class", "range") .attr("x1", 0) .attr("x2", 0) .attr("y1", yScale(d.max) - yScale(d.median)) .attr("y2", yScale(d.min) - yScale(d.median)) Listing 4.5 The .each() function of the boxplot drawing five child elements Figure 4.14 The elements are now properly placed so that their top and bottom correspond with the visitor age between the first and third quartiles of visitors for each day. The circles are completely covered, except for the second rectangle where the first quartile value is the same as the median age, and so we can see half the gray circle peeking out from underneath it. Sets a negative offset of half the width to center a rectangle horizontally The height of the rectangle is equal to the difference between its q1 and q3 values, which means we need to offset the rectangle by the difference between the middle of the rectangle (the median) and the high end of the distribution—q3. Draws the line from the min to the max value 122 CHAPTER 4 Chart components .style("stroke", "black") .style("stroke-width", "4px"); d3.select(this) .append("line") .attr("class", "max") .attr("x1", -10) .attr("x2", 10) .attr("y1", yScale(d.max) - yScale(d.median)) .attr("y2", yScale(d.max) - yScale(d.median)) .style("stroke", "black") .style("stroke-width", "4px"); d3.select(this) .append("line") .attr("class", "min") .attr("x1", -10) .attr("x2", 10) The invisible parent element of all your graphical elements is a group. As each is appended, you select it to append more elements with size and shape derived from the data. Each is centered on the median value, so each child element needs to be drawn relative to that value for it to display properly. Drawn at the scaled value minus the scaled average, which places each at the right position relative to the parent to indicate the correct value. yScale(d.q1) – yScale(d.median) yScale(d.min) – yScale(d.median) –10 10 0 Drawn behind all the other elements, and so drawn rst, from max to min and thus needsfi to have the y1 and y2 values subtracted from the average to draw correctly. The only child element of the boxplot that isn’t a line represents the densest region of the distribution, letting your users know the age range of the vast majority of your visitors. To draw it, we need to offset the to the scaled third quartile from the median and set the height to be the scaled third quartile minus the scaled rst quartile.fi Figure 4.15 How a boxplot can be drawn in D3. Pay particular attention to the relative positioning necessary to draw child elements of a . The 0 positions for all elements are where the parent has been placed, so that , , and all need to be drawn with an offset placing their top-left corner above this center, whereas is drawn below the center and has a 0 y-value, because our center is the median value. The top bar of the min-max line 123Complex graphical objects .attr("y1", yScale(d.min) - yScale(d.median)) .attr("y2", yScale(d.min) - yScale(d.median)) .style("stroke", "black") .style("stroke-width", "4px"); d3.select(this) .append("rect") .attr("class", "range") .attr("width", 20) .attr("x", -10) .attr("y", yScale(d.q3) - yScale(d.median)) .attr("height", yScale(d.q1) - yScale(d.q3)) .style("fill", "white") .style("stroke", "black") .style("stroke-width", "2px"); d3.select(this) .append("line") .attr("x1", -10) .attr("x2", 10) .attr("y1", 0) .attr("y2", 0) .style("stroke", "darkgray") .style("stroke-width", "4px"); }); Listing 4.6 fulfills the requirement that we should also add an x-axis to remind us which day each box is associated with. This takes advantage of the explicit .tick- Values() function you saw earlier. It also uses negative tickSize() and the corre- sponding offset of the that we use to call the axis function. var xAxis = d3.svg.axis().scale(xScale).orient("bottom") .tickSize(-470) .tickValues([1,2,3,4,5,6,7]); d3.select("svg").append("g") .attr("transform", "translate(0,470)") .attr("id", "xAxisG").call(xAxis); d3.select("#xAxisG > path.domain").style("display", "none"); The end result of all this is a chart where each of our datapoints is represented, not by a single circle, but by a multipart graphical element designed to emphasize distribution. The boxplot in figure 4.16 encodes not just the median age of visitors for that day, but the minimum, maximum, and distribution of the age of the majority of visitors. This expresses in detail the demographics of visitorship clearly and cleanly. It doesn’t include the number of visitors, but we could encode that with color, make it available Listing 4.6 Adding an axis using tickValues The bottom bar of the min-max line The offset so that the rectangle is centered on the median value Median line doesn’t need to be moved, because the parent is centered on the median value A negative tickSize draws the lines above the axis, but we need to make sure to offset the axis by the same value. Setting specific tickValues forces the axis to only show the corresponding values, which is useful when we want to override the automatic ticks created by the axis. Offsets the axis to correspond with our negative tickSize We can hide this, because it has extra ticks on the ends that distract our readers. 124 CHAPTER 4 Chart components on a click of each boxplot, or make the width of the boxplot correspond to the num- ber of visitors. We looked at boxplots because a boxplot allows you to explore the creation of mul- tipart objects while using lines and rectangles. But what’s the value of a visualization like this that shows distribution? It encodes a graphical summary of the data, provid- ing information about visitor age for the site on Wednesday, such as, “Most visitors were between the ages of 18 and 28. The oldest was 40. The youngest was 15. The median age was 25.” It also allows you to quickly perform visual queries, checking to see if the median age of one day was within the majority of visitor ages of another day. We’ll stop exploring boxplots, and take a look at a different kind of complex graphical object: an interpolated line. 4.4 Line charts and interpolations You create line charts by drawing connections between points. A line that connects points, and the shaded regions inside or outside the area constrained by the line, tell a story about the data. Although a line chart is technically a static data visualization, it’s also a representation of change, typically over time. We’ll start with a new dataset in listing 4.7 that better represents change over time. Let’s imagine we have a Twitter account and we’ve been tracking the number of tweets, favorites, and retweets to determine at what time we have the greatest response to our social media. Although we’ll ultimately deal with this kind of data as JSON, we’ll want to start with a comma-delimited file, because it’s the most efficient for this kind of data. Figure 4.16 Our final boxplot chart. Each day now shows not only the median age of visitors but also the range of visiting ages, allowing for a more extensive examination of the demographics of site visitorship. 125Line charts and interpolations day,tweets,retweets,favorites 1,1,2,5 2,6,11,3 3,3,0,1 4,5,2,6 5,10,29,16 6,4,22,10 7,3,14,1 8,5,7,7 9,1,35,22 10,4,16,15 First we pull this CSV in using d3.csv() as we did in chapter 2, and then we create cir- cles for each datapoint. We do this for each variation on the data, with the .day attri- bute determining x position and the other datapoint determining y position. We create the usual x and y scales to draw the shapes in the confines of our canvas. We also have a couple of axes to frame our results. Notice that we differentiated between the three datatypes by coloring them differently. d3.csv("tweetdata.csv", lineChart); function lineChart(data) { xScale = d3.scale.linear().domain([1,10.5]).range([20,480]); yScale = d3.scale.linear().domain([0,35]).range([480,20]); xAxis = d3.svg.axis() .scale(xScale) .orient("bottom") .tickSize(480) .tickValues([1,2,3,4,5,6,7,8,9,10]); d3.select("svg").append("g").attr("id", "xAxisG").call(xAxis); yAxis = d3.svg.axis() .scale(yScale) .orient("right") .ticks(10) .tickSize(480); d3.select("svg").append("g").attr("id", "yAxisG").call(yAxis); d3.select("svg").selectAll("circle.tweets") .data(data) .enter() .append("circle") .attr("class", "tweets") .attr("r", 5) .attr("cx", function(d) {return xScale(d.day)}) .attr("cy", function(d) {return yScale(d.tweets)}) .style("fill", "black"); Listing 4.7 tweetdata.csv Listing 4.8 Callback function to draw a scatterplot from tweetdata Our scales, as usual, have margins built in. Fixes the ticks of the x-axis to correspond to the days Each of these uses the same dataset, but bases the y position on tweets, retweets, and favorites values, respectively. 126 CHAPTER 4 Chart components d3.select("svg").selectAll("circle.retweets") .data(data) .enter() .append("circle") .attr("class", "retweets") .attr("r", 5) .attr("cx", function(d) {return xScale(d.day)}) .attr("cy", function(d) {return yScale(d.retweets)}) .style("fill", "lightgray"); d3.select("svg").selectAll("circle.favorites") .data(data) .enter() .append("circle") .attr("class", "favorites") .attr("r", 5) .attr("cx", function(d) {return xScale(d.day)}) .attr("cy", function(d) {return yScale(d.favorites)}) .style("fill", "gray"); }; The graphical results of this code, as shown in figure 4.17, which take advantage of the CSS rules we defined earlier, aren’t easily interpreted. 4.4.1 Drawing a line from points By drawing a line that intersects each point of the same category, we can compare the number of tweets, retweets, and favorites. We can start by drawing a line for tweets using d3.svg.line(). This line generator expects an array of points as data, and we’ll need to tell the generator what values constitute the x and y coordinates for each Figure 4.17 A scatterplot showing the datapoints for 10 days of activity on Twitter, with the number of tweets in light gray, the number of retweets in dark gray, and the number of favorites in black 127Line charts and interpolations point. By default, this generator expects a two-part array, where the first part is the x value and the second part is the y value. We can’t use that, because our x value is based on the day of the activity and our y value is based on the amount of activity. The .x() accessor function of the line generator needs to point at the scaled day value, while the .y() accessor function needs to point to the scaled value of the appro- priate activity. The line function itself takes the entire dataset that we loaded from tweet- data, and returns the SVG drawing code necessary for a line between the points in that dataset. To generate three lines, we use the dataset three times, with a slightly different generator for each. We not only need to write the generator function and define how it accesses the data it uses to draw the line, but we also need to append a to our canvas and set its d attribute to equal the generator function we defined. var tweetLine = d3.svg.line() .x(function(d) { return xScale(d.day); }) .y(function(d) { return yScale(d.tweets); }); d3.select("svg") .append("path") .attr("d", tweetLine(data)) .attr("fill", "none") .attr("stroke", "darkred") .attr("stroke-width", 2); Listing 4.9 New line generator code inside the callback function Defines an accessor for data like ours; in this case we take the day attribute and pass it to xScale first This accessor does the same for the number of tweets. The appended path is drawn according to the generator with the loaded tweetdata passed to it. Figure 4.18 The line generator takes the entire dataset and draws a line where the x,y position of every point on the canvas is based on its accessor. In this case, each point on the line corresponds to the day, and tweets are scaled to fit the x and y scales we created to display the data on the canvas. 128 CHAPTER 4 Chart components We draw the line above the circles we already drew, and the line generator produces the plot shown in figure 4.18. 4.4.2 Drawing many lines with multiple generators If we build a line constructor for each datatype in our set and call each with its own path, as shown in the following listing, then you can see the variation over time for each of your datapoints. Listing 4.10 demonstrates how to build those generators with our dataset, and figure 4.19 shows the results of that code. var tweetLine = d3.svg.line() .x(function(d) { return xScale(d.day) }) .y(function(d) { return yScale(d.tweets) }); var retweetLine = d3.svg.line() .x(function(d) { return xScale(d.day) }) .y(function(d) { return yScale(d.retweets) }); var favLine = d3.svg.line() .x(function(d) { return xScale(d.day); }) .y(function(d) { return yScale(d.favorites); }); d3.select("svg") .append("path") .attr("d", tweetLine(data)) .attr("fill", "none") .attr("stroke", "darkred") .attr("stroke-width", 2); d3.select("svg") .append("path") .attr("d", retweetLine(data)) .attr("fill", "none") .attr("stroke", "gray") .attr("stroke-width", 3); d3.select("svg") .append("path") .attr("d", favLine(data)) .attr("fill", "none") .attr("stroke", "black") .attr("stroke-width", 2); Listing 4.10 Line generators for each tweetdata A more efficient way to do this would be to define one line generator, and then modify the .y() accessor on the fly as we call it for each line. But it’s easier to see the functionality this way. Notice how only the y accessor is different between each line generator. Each line generator needs to be called by a corresponding new element . 129Line charts and interpolations 4.4.3 Exploring line interpolators D3 provides a number of interpolation methods with which to draw these lines, so that they can more accurately represent the data. In cases like tweetdata, where you have discrete points that represent data accurately and not samples, then the default “lin- ear” method shown in figure 4.19 is appropriate. But in other cases, a different inter- polation method for the lines, like the ones shown in figure 4.20, may be appropriate. Here’s the same data but with the d3.svg.line() generator using different interpola- tion methods: tweetLine.interpolate("basis"); retweetLine.interpolate("step"); favLine.interpolate("cardinal"); What’s the best interpolation? Interpolation modifies the representation of data. Experiment with this drawing code to see how the different interpolation settings show different information than other interpolators. Data can be visualized in different ways, all correct from a programming perspective, and it’s up to you to make sure the information you’re visualizing reflects the actual phenomena. Data visualization deals with the visual representation of statistical principles, which means it’s subject to all the dangers of the misuse of statistics. The interpolation of lines is particularly vulnerable to misuse, because it changes a clunky-looking line into a smooth, “natural” line. Figure 4.19 The dataset is first used to draw a set of circles, which creates the scatterplot from the beginning of this section. The dataset is then used three more times to draw each line. We can add this code right after we create our line generators and before we call them to change the interpolate method, or we can set .interpolate() as we’re defining the generator. 130 CHAPTER 4 Chart components 4.5 Complex accessor functions All of the previous chart types we built were based on points. The scatterplot is points on a grid, the boxplot consists of complex graphical objects in place of points, and line charts use points as the basis for drawing a line. In this and earlier chapters, we’ve dealt with rather staid examples of information visualization that we might easily cre- ate in any traditional spreadsheet. But you didn’t get into this business to make Excel charts. You want to wow your audience with beautiful data, win awards for your aes- thetic je ne sais quoi, and evoke deep emotional responses with your representation of change over time. You want to make streamgraphs like the one in figure 4.21. Figure 4.20 Light gray: “basis” interpolation; dark gray: “step” interpolation; black: “cardinal” interpolation Figure 4.21 Behold the glory of the streamgraph. Look on my works, ye mighty, and despair! (figure from The New York Times, February 23, 2008; http://mng.bz/rV7M) 131Complex accessor functions The streamgraph is a sublime piece of information visualization that represents varia- tion and change, like the boxplot. It may seem like a difficult thing to create, until you start to put the pieces together. Ultimately, a streamgraph is what’s known as a stacked chart. The layers accrete upon each other and adjust the area of the elements above and below, based on the space taken up by the components closer to the center. It appears organic because that accretive nature mimics the way many organisms grow, and seems to imply the kinds of emergent properties that govern the growth and decay of organisms. We’ll interpret its appearance later, but first let’s figure out how to build it. The reason we’re looking at a streamgraph is because it’s not that exotic. A stream- graph is a stacked graph, which means it’s fundamentally similar to your earlier line charts. By learning how to make it, you can better understand another kind of genera- tor, d3.svg.area(). The first thing you need is data that’s amenable to this kind of visu- alization. Let’s follow the New York Times, from which we get the streamgraph in figure 4.21, and work with the gross earnings for six movies over the course of nine days. Each datapoint is therefore the amount of money a movie made on a particular day. day,movie1,movie2,movie3,movie4,movie5,movie6 1,20,8,3,0,0,0 2,18,5,1,13,0,0 3,14,3,1,10,0,0 4,7,3,0,5,27,15 5,4,3,0,2,20,14 6,3,1,0,0,10,13 7,2,0,0,0,8,12 8,0,0,0,0,6,11 9,0,0,0,0,3,9 10,0,0,0,0,1,8 To build a streamgraph, you need to get more sophisticated with the way you access data and feed it to generators when drawing lines. In our earlier example, we created three different line generators for our dataset, but that’s terribly inefficient. We also used simple functions to draw the lines. But we’ll need more than that to draw some- thing like a streamgraph. Even if you think you won’t want to draw streamgraphs (and there are reasons why you may not, which we’ll get into at the end of this section), the important thing to focus on when you look at listing 4.11 is how you use accessors with D3’s line and, later, area generators. var xScale = d3.scale.linear().domain([ 1, 8 ]).range([ 20, 470 ]); var yScale = d3.scale.linear().domain([ 0, 100 ]).range([ 480, 20 ]); for (x in data[0]) { if (x != "day") { Listing 4.11 movies.csv Listing 4.12 The callback function to draw movies.csv as a line chart Iterates through our data attributes with a for loop, where x is the name of each column from our data ("day", "movie1", "movie2", and so on), which allows us to dynamically create and call generators 132 CHAPTER 4 Chart components var movieArea = d3.svg.line() .x(function(d) { return xScale(d.day); }) .y(function(d) { return yScale(d[x]); }) .interpolate("cardinal"); d3.select("svg") .append("path") .style("id", x + "Area") .attr("d", movieArea(data)) .attr("fill", "none") .attr("stroke", "black") .attr("stroke-width", 3) .style("opacity", .75); }; }; The line-drawing code produces a cluttered line chart, as shown in figure 4.22. As you learned in chapter 1, lines and filled areas are almost exactly the same thing in SVG. You can differentiate them by a Z at the end of the drawing code that indicates the shape is closed, or the presence or absence of a "fill" style. D3 provides d3.svg.line and d3.svg.area generators to draw lines or areas. Both of these constructors pro- duce elements, but d3.svg.area provides helper functions to bound the lower end of your path to produce areas in charts. This means we need to define a .y0() Instantiates a line generator for each movie Every line uses the day column for its x value. Dynamically sets the y-accessor function of our line generator to grab the data from the appropriate movie for our y variable Figure 4.22 Each movie column is drawn as a separate line. Notice how the “cardinal” interpolation creates a graphical artifact, where it seems like some movies made negative money. 133Complex accessor functions accessor that corresponds to our y accessor and determines the shape of the bottom of our area. Let’s see how d3.svg.area() works. for (x in data[0]) { if (x != "day") { var movieArea = d3.svg.area() .x(function(d) { return xScale(d.day); }) .y(function(d) { return yScale(d[x]); }) .y0(function(d) { return yScale(-d[x]); }) .interpolate("cardinal"); d3.select("svg") .append("path") .style("id", x + "Area") .attr("d", movieArea(data)) .attr("fill", "darkgray") .attr("stroke", "lightgray") .attr("stroke-width", 2) .style("opacity", .5); }; }; Listing 4.13 Area accessors This new accessor provides us with the ability to define where the bottom of the path is. In this case, we start by making the bottom equal to the inverse of the top, which mirrors the shape. Figure 4.23 By using an area generator and defining the bottom of the area as the inverse of the top, we can mirror our lines to create an area chart. Here they’re drawn with semitransparent fills, so that we can see how they overlap. 134 CHAPTER 4 Chart components Should you always draw filled paths with d3.svg.area? No. Counterintuitively, you should use d3.svg.line to draw filled areas. To do so, though, you need to append Z to the created d attribute. This indicates that the path is closed. Open path Closed path changes Explanation movieArea = d3.svg.line() .x(function(d) { return xScale(d.day) }) .y(function(d) { return yScale(d[x]) }) .interpolate("cardinal"); You write the con- structor for the line- drawing code the same regardless of whether you want a line or shape, filled or unfilled. d3.select("svg") .append("path") .attr("d", movieArea(data)) .attr("fill", "none") .attr("stroke", "black") .attr("stroke-width", 3); d3.select("svg") .append("path") .attr("d", movieArea(data) + "Z") .attr("fill", "none") .attr("stroke", "black") .attr("stroke-width", 3); When you call the constructor, you append a element. You specify whether the line is “closed” by concate- nating a Z to the string created by your line constructor for the d attribute of the . When you add a Z to the end of an SVG element’s d attribute, it draws a line connecting the two end points. d3.select("svg") .append("path") .attr("d", movieArea(data)) .attr("fill", "none") .attr("stroke", "black") .attr("stroke-width", 3); d3.select("svg") .append("path") .attr("d", movieArea(data) + "Z") .attr("fill", "gray") .attr("stroke", "black") .attr("stroke-width", 3); You may think that only a closed path could be filled, but the fill of a path is the same whether or not you close the line by appending Z. The area of a path filled is always the same, whether it’s closed or not. 135Complex accessor functions By defining the y0 function of d3.svg.area, we’ve mirrored the path created and filled it as shown in figure 4.23, which is a step in the right direction. Notice that we’re presenting inaccurate data now, because the area of the path is twice the area of the data. We want our areas to draw one on top of the other, so we need .y0() to point to a complex stacking function that makes the bottom of an area equal to the top of the previously drawn area. D3 comes with a stacking function, .stack(), which we’ll look at later, but for the purpose of our example, we’ll write our own. var fillScale = d3.scale.linear() .domain([0,5]) .range(["lightgray","black"]); var n = 0; for (x in data[0]) { if (x != "day") { var movieArea = d3.svg.area() .x(function(d) { return xScale(d.day) }) .y(function(d) { return yScale(simpleStacking(d,x)) }) .y0(function(d) { return yScale(simpleStacking(d,x) - d[x]); }) .interpolate("basis") d3.select("svg") .append("path") .style("id", x + "Area") .attr("d", movieArea(data)) .attr("fill", fillScale(n)) .attr("stroke", "none") .attr("stroke-width", 2) .style("opacity", .5); n++; }; }; function simpleStacking( incomingData, incomingAttribute) { var newHeight = 0; for (x in incomingData) { if (x != "day") { newHeight += parseInt(incomingData[x]); if (x == incomingAttribute) { You use d3.svg.line when you want to draw most shapes and lines, whether filled or unfilled, or closed or open. You should use d3.svg.area() when you want to draw a shape where the bottom of the shape can be calculated based on the top of the shape as you’re drawing it. It’s suitable for drawing bands of data, such as that found in a stacked area chart or streamgraph. Listing 4.14 Callback function for drawing stacked areas Creates a color ramp that corresponds to the six different movies Each movie corresponds to one iteration through the for loop, so we’ll increment n to use in the color ramp. We could also create an ordinal scale assigning a color for each movie. We won’t draw a line for the day value of each object, because this is what provides us with our x coordinate. A d3.svg.area() generator for each iteration through the object that corresponds to one of our movies using the day value for the x coordinate, but iterating through the values for each movie for the y coordinates Draws a path using the current constructor. We’ll have one for each attribute not named "day". Give it a unique ID based on which attribute we’re drawing an area for. Fill the area with a color based on the color ramp we built. Finishes the for loop, increments to the next attribute in the object, and increments n to color the next area This function takes the incoming bound data and the name of the attribute and loops through the incoming data, adding each value until it reaches the current named attribute. As a result, it returns the total value for every movie during this day up to the movie we’ve sent. 136 CHAPTER 4 Chart components break; } } } return newHeight; }; The stacked area chart in figure 4.24 is already complex. To make it a proper stream- graph, the stacks need to alternate. This requires a more complicated stacking function. … var movieArea = d3.svg.area().x(function(d) { return xScale(d.day) }) .y(function(d) { return yScale(alternatingStacking(d,x,"top")) }) .y0(function(d) { return yScale(alternatingStacking(d,x,"bottom")); }).interpolate("basis"); … function alternatingStacking(incomingData,incomingAttribute,topBottom) { var newHeight = 0; var skip = true; for (x in incomingData) { if (x != "day") { if (x == "movie1" || skip == false) { newHeight += parseInt(incomingData[x]); if (x == incomingAttribute) { break; } if (skip == false) { skip = true; } else { n%2 == 0 ? skip = false : skip = true; } } else { skip = false; } } } if(topBottom == "bottom") { newHeight = -newHeight; } if (n > 1 && n%2 == 1 && topBottom == "bottom") { newHeight = 0; } if (n > 1 && n%2 == 0 && topBottom == "top") { newHeight = 0; } return newHeight; }; Listing 4.15 A stacking function that alternates vertical position of area drawn We can create whatever complex accessor function we want for our generators. We need the data, and we also need to know whether we’re drawing the top or bottom of the area, which alternates as we move through the dataset. Always skips day, because that’s just our x position Skips the first movie (our center), and then skips every other movie to get the alternating pattern Stops when we reach this movie, which gives us the baseline The height is negative for areas on the bottom side of the streamgraph, and positive for those on the top side. 137Complex accessor functions The streamgraph in figure 4.25 has some obvious issues, but we’re not going to correct them. For one thing, we’re over-representing the gross of the first movie by drawing it at twice the height. If we wanted to, we could easily make the stacking function account for this by halving the values of that first area. Another issue is that the areas being drawn are different from the areas being displayed, which isn’t a problem when our data visual- ization is going to be read from only one perspective and not multiple perspectives. y0: 20 – 20 = 0 y0: 7 – 7 = 0 Movie1 Color: fillScale(0) Day 1 y: 20 Day 4 y: 7 y0: 28 – 8 = 20 y0: 10 – 3 = 7 Movie2 Color: fillScale(1) Day 1 y: 20 + 8 = 28 Day 4 y: 7 + 3 = 10 y0: 31 – 8 = 23 y0: 10 – 0 = 10 Movie3 Color: fillScale(2) Day 1 y: 20 + 8 + 3 = 31 Day 4 y: 7 + 3 + 0 = 10 y0: 31 – 0 = 31 y0: 15 – 5 = 10 Movie4 Color: fillScale(3) Day 1 y: 20 + 8 + 3 = 31 Day 4 y: 7 + 3 + 0 + 5 = 15 Figure 4.24 Our stacked area code represents a movie by drawing an area, where the bottom of that area equals the total amount of money made by any movies drawn earlier for that day. Figure 4.25 A streamgraph that shows the accreted values for movies by day. The problems of using different interpolation methods are clear. The basis method here shows some inaccuracies, and the difficulty of labeling the scale is also apparent. 138 CHAPTER 4 Chart components But the purpose of this section is to focus on building complex accessor functions to create, from scratch, the kinds of data visualization you’ve seen and likely thought of as exotic. Let’s assume this data is correct and take a moment to analyze the effective- ness of this admittedly attractive method of visualizing data. Is this really a better way to show movie grosses than a simpler stacked graph or line chart? That depends on the scale of the questions being addressed by the chart. If you’re trying to discover overall patterns of variation in movie grosses, as well as spot interactions between them (for instance, seeing if a particularly high-grossing-over-time movie interferes with the opening of another movie), then it may be useful. If you’re trying to impress an audience with a complex-looking chart, it would also be useful. Otherwise, you’ll be better off with something simpler than this. But even if you only build less-visually impressive charts, you’ll still use the same techniques we’ve gone over in this section. 4.6 Summary In this chapter you’ve learned the basics of creating charts: ■ Integrating generators and components with the selection and binding process ■ Learning about D3 components and the axis component to create chart ele- ments like an x-axis and a y-axis ■ Interpolating graphical elements, such as lines or areas from point data, using D3 generators ■ Creating complex SVG objects that use the element’s ability to create child shapes, which can be drawn based on the bound dataset, using .each() ■ Exploring the representation of multidimensional data using boxplots ■ Combining and extending these methods to implement a sophisticated chart- ing method, the streamgraph, while learning how such charts may outstrip their audience’s ability to successfully interpret such data These skills and methods will help you to better understand the D3 layouts, which we’ll explore in more detail in the following chapters. The incredible breadth of data visualization techniques possible with D3 is based on the fundamental similarity between different methods of displaying data, at the visual level, at the functional level, and at the data level. By understanding how the processes work and how they can be combined to create more interactive and rich representation, you’ll be better equipped to choose and deploy the right one for your data. 139 Layouts D3 contains a variety of functions, referred to as layouts, that help you format your data so that it can be presented using a popular charting method. In this chapter we’ll look at several different layouts so that you can understand general layout functionality, learn how to deal with D3’s layout structure, and deploy one of these layouts (some of which are shown in figure 5.1) with your data. In each case, as you’ll see with the following examples, when a dataset is associ- ated with a layout, each of the objects in the dataset has attributes that allow for drawing the data. Layouts don’t draw the data, nor are they called like components or referred to in the drawing code like generators. Rather, they’re a preprocessing step that formats your data so that it’s ready to be displayed in the form you’ve cho- sen. You can update a layout, and then if you rebind that altered data to your graphical objects, you can use the D3 enter/update/exit syntax you encountered in chapter 2 to update your layout. Paired with animated transitions, this can provide you with the framework for an interactive, dynamic chart. This chapter covers ■ Histogram and pie chart layouts ■ Simple tweening ■ Tree, circle pack, and stack layouts ■ Sankey diagrams and word clouds 140 CHAPTER 5 Layouts This chapter gives an overview of layout structure by implementing popular layouts such as the histogram, pie chart, tree, and circle packing. Other layouts such as the chord layout and more exotic ones follow the same principles and should be easy to understand after looking at these. We’ll get started with a kind of chart you’ve already worked with, the bar chart or histogram, which has its own layout that helps abstract the process of building this kind of chart. 5.1 Histograms Before we get into charts that you’ll need layouts for, let’s take a look at a chart that we easily made without a layout. In chapter 2 we made a bar chart based on our Twitter data by using d3.nest(). But D3 has a layout, d3.layout.histogram(), that bins val- ues automatically and provides us with the necessary settings to draw a bar chart based on a scale that we’ve defined. Many people who get started with D3 think it’s a chart- ing library, and that they’ll find a function like d3.layout.histogram that creates a bar chart in a
when it’s run. But D3 layouts don’t result in charts; they result in the settings necessary for charts. You have to put in a bit of extra work for charts, but Figure 5.1 Multiple layouts are demonstrated in this chapter, including the circle pack (section 5.3), tree (section 5.4), stack (section 5.5), and Sankey (section 5.6.1), as well as tweening to properly animate shapes like the arcs in pie charts (section 5.2.3). 141Histograms you have enormous flexibility (as you’ll see in this and later chapters) that allows you to make diagrams and charts that you can’t find in other libraries. Listing 5.1 shows the code to create a histogram layout and associate it with a particu- lar scale. I’ve also included an example of how you can use interactivity to adjust the orig- inal layout and rebind the data to your shapes. This changes the histogram from showing the number of tweets that were favorited to the number of tweets that were retweeted. d3.json("tweets.json", function(error, data) { histogram(data.tweets) }); function histogram(tweetsData) { var xScale = d3.scale.linear().domain([ 0, 5 ]).range([ 0, 500 ]); var yScale = d3.scale.linear().domain([ 0, 10 ]).range([ 400, 0 ]); var xAxis = d3.svg.axis().scale(xScale).ticks(5).orient("bottom"); var histoChart = d3.layout.histogram(); histoChart.bins([ 0, 1, 2, 3, 4, 5 ]).value(function(d) { return d.favorites.length; }); histoData = histoChart(tweetsData); d3.select("svg").selectAll("rect").data(histoData).enter() .append("rect").attr("x", function(d) { return xScale(d.x); }).attr("y", function(d) { return yScale(d.y); }).attr("width", xScale(histoData[0].dx) - 2) .attr("height", function(d) { return 400 - yScale(d.y); }).on("click", retweets); d3.select("svg").append("g").attr("class", "x axis") .attr("transform", "translate(0,400)").call(xAxis); d3.select("g.axis").selectAll("text").attr("dx", 50); function retweets() { histoChart.value(function(d) { return d.retweets.length; }); histoData = histoChart(tweetsData); d3.selectAll("rect").data(histoData) .transition().duration(500).attr("x", function(d) { return xScale(d.x) }).attr("y", function(d) { return yScale(d.y) }).attr("height", function(d) { return 400 - yScale(d.y); }); }; }; Listing 5.1 Histogram code Creates a new layout function Determines the values the histogram bins forThe value the layout is binning for from the datapointFormats the data Formatted data is used to draw the bars Centers the axis labels under the bars Changes the value being measured Binds and redraws the new data 142 CHAPTER 5 Layouts You’re not expected to follow the process of using the histogram to create the results in figure 5.2. You’ll get into that as you look at more layouts throughout this chapter. Notice a few general principles: first, a layout formats the data for display, as I pointed out in the beginning of chapter 4. Second, you still need the same scales and compo- nents that you needed when you created a bar chart from raw data without the help of a layout. Third, the histogram is useful because it automatically bins data, whether it’s whole numbers like this or it falls in a range of values in a scale. Finally, if you want to dynamically change a chart using a different dimension of your data, you don’t need to remove the original. You just need to reformat your data using the layout and rebind it to the original elements, preferably with a transition. You’ll see this in more detail in your next example, which uses another type of chart: pie charts. 5.2 Pie charts One of the most straightforward layouts available in D3 is the pie layout, which is used to make pie charts like those shown in figure 5.3. Like all layouts, a pie layout can be created, assigned to a variable, and used as both an object and a function. In this sec- tion you’ll learn how to create a pie chart and transform it into a ring chart. You’ll also learn how to use tweening to properly transition it when you change its data source. After you create it, you can pass it an array of values (which I’ll refer to as a dataset), and it will compute the necessary starting and ending angles for each of those values to draw a pie chart. When we pass an array of numbers as our dataset to a pie layout in the console as in the following code, it doesn’t produce any kind of graphics but rather results in the response shown in figure 5.4: var pieChart = d3.layout.pie(); var yourPie = pieChart([1,1,2]); Figure 5.2 The histogram in its initial state (left) and after we change the measure from favorites to retweets (right) by clicking on one of the bars. 143Pie charts Our pieChart function created a new array of three objects. The startAngle and endAngle for each of the data values draw a pie chart with one piece from 0 degrees to pi, the next from pi to 1.5 pi, and the last from 1.5 pi to 2 pi. But this isn’t a drawing, or SVG code like the line and area generators produced. Figure 5.3 The traditional pie chart (bottom right) represents proportion as an angled slice of a circle. With slight modification, it can be turned into a donut or ring chart (top) or an exploded pie chart (bottom left). Original dataset A layout takes one (and sometimes more) datasets. In this case, the dataset is an array of numbers [1,1,2]. It transforms that dataset for the purpose of drawing it. Transformed dataset The layout returns a dataset that has a reference to the original data but also includes new attributes that are meant to be passed to graphical elements or generators. In this case, the pie layout creates an array of objects with the endAngle and startAngle values necessary for the arc generator to create the pie pieces necessary for a pie chart. Figure 5.4 A pie layout applied to an array of [1,1,2] shows objects created with a start angle, end angle, and value attribute corresponding to the dataset, as well as the original data, which in this case is a number. 144 CHAPTER 5 Layouts 5.2.1 Drawing the pie layout These are settings that need to be passed to a generator to make each of the pieces of our pie chart. This particular generator is d3.svg.arc, and it’s instantiated like the generators we worked with in chapter 4. It has a few settings, but the only one we need for this first example is the outerRadius() function, which allows us to set a dynamic or fixed radius for our arcs: var newArc = d3.svg.arc(); newArc.outerRadius(100); console.log(newArc(yourPie[0])); Now that you know how the arc constructor works and that it works with our data, all we need to do is bind the data created by our pie layout and pass it to elements to draw our pie chart. The pie layout is centered on the 0,0 point in the same way as a circle. If we want to draw it at the center of our canvas, we need to create a new element to hold the elements we’ll draw and then move the to the center of the canvas: d3.select("svg") .append("g") .attr("transform","translate(250,250)") .selectAll("path") .data(yourPie) .enter() .append("path") .attr("d", newArc) .style("fill", "blue") .style("opacity", .5) .style("stroke", "black") .style("stroke-width", "2px"); Figure 5.5 shows our pie chart. The pie chart layout, like most layouts, grows more complicated when you want to work with JSON object arrays rather than number Gives our arcs and resulting pie chart a radius of 100 px Returns the d attribute necessary to draw this arc as a element: "M6.123031769111886e-15,100A100,100 0 0,1 -100,1.2246063538223773e-14L0,0Z" Figure 5.5 A pie chart showing three pie pieces that subdivide the circle between the values in the array [1,1,2]. Appends a new and moves it to the middle of the canvas so that it’ll be easier to see the results Binds the array that was created using the pie layout, not our original array or the pie layout itself Each path drawn based on that array needs to pass through the newArc function, which sees the startAngle and endAngle attributes of the objects and produces the commensurate SVG drawing code. 145Pie charts arrays. Let’s bring back our tweets.json from chapter 2. We can nest and measure it to transform it from an array of tweets into an array of Twitter users with their number of tweets computed: var nestedTweets = d3.nest() .key(function (el) { return el.user; }) .entries(incData); nestedTweets.forEach(function (el) { el.numTweets = el.values.length; el.numFavorites = d3.sum(el.values, function (d) { return d.favorites.length; }); el.numRetweets = d3.sum(el.values, function (d) { return d.retweets.length; }); }); 5.2.2 Creating a ring chart If we try to run pieChart(nestedTweets) like with the earlier array illustrated in fig- ure 5.4, it will fail, because it doesn’t know that the numbers we should be using to size our pie pieces come from the .numTweets attribute. Most layouts, pie included, can define where the values are in your array by defining an accessor function to get to those values. In the case of nestedTweets, we define pieChart.value() to point at the numTweets attribute of the dataset it’s being used on. While we’re at it, let’s set a value for our arc generator’s innerRadius() so that we create a donut chart instead of a pie chart. With those changes in place, we can use the same code as before to draw the pie chart in figure 5.6: pieChart.value(function(d) { return d.numTweets; }); newArc.innerRadius(20) yourPie = pieChart(nestedTweets); Gives the total number of favorites by summing the favorites array length of all the tweets Gives the total number of retweets by doing the same for the retweets array length Figure 5.6 A donut chart showing the number of tweets from our four users represented in the nestedTweets dataset 146 CHAPTER 5 Layouts 5.2.3 Transitioning You’ll notice that for each value in nestedTweets, we totaled the number of tweets, and also used d3.sum() to total the number of retweets and favorites (if any). Because we have this data, we can adjust our pie chart to show pie pieces based not on the number of tweets but on those other values. One of the core uses of a layout in D3 is to update the graphical chart. All we need to do is make changes to the data or layout and then rebind the data to the existing graphical elements. By using a transition, we can see the pie chart change from one form to the other. Running the following code first transforms the pie chart to represent the number of favorites instead of the num- ber of tweets. The next block causes the pie chart to represent the number of retweets. The final forms of the pie chart after running that code are shown in figure 5.7. pieChart.value(function(d) { return d.numFavorites }); d3.selectAll("path").data(pieChart(nestedTweets)) .transition().duration(1000).attr("d", newArc); pieChart.value(function(d) {return d.numRetweets}); d3.selectAll("path").data(pieChart(nestedTweets)) .transition().duration(1000).attr("d", newArc); Although the results are what we want, the transition can leave a lot to be desired. Fig- ure 5.8 shows snapshots of the pie chart transitioning from representing the number of tweets to representing the number of favorites. As you’ll see by running the code Figure 5.7 The pie charts representing, on the left, the total number of favorites and, on the right, the total number of retweets Figure 5.8 Snapshots of the transition of the pie chart representing the number of tweets to the number of favorites. This transition highlights the need to assign key values for data binding and to use tweens for some types of graphical transition, such as that used for arcs. 147Pie charts and comparing these snapshots, the pie chart doesn’t smoothly transition from one state to another but instead distorts quite significantly. The reason you see this wonky transition is because, as you learned earlier, the default data-binding key is array position. When the pie layout measures data, it also sorts it in order from largest to smallest, to create a more readable chart. But when you recall the layout, it re-sorts the dataset. The data objects are bound to different pieces in the pie chart, and when you transition between them graphically, you see the effect shown in figure 5.8. To prevent this from happening, we need to disable this sort: pieChart.sort(null); The result is a smooth graphical transition between numTweets and numRetweets, because the object position in the array remains unchanged, and so the transition in the drawn shapes is straightforward. But if you look closely, you’ll notice that the circle deforms a bit because the default transition() behavior doesn’t deal with arcs well. It’s not transitioning the degrees in our arcs; instead, it’s treating each arc as a geo- metric shape and transitioning from one to another. This becomes obvious when you look at the transition from either of those versions of our pie chart to one that shows numFavorites, because some of the objects in our dataset have 0 values for that attribute, and one of them changes size dramatically. To clean this all up and make our pie chart transition properly, we need to change the code. Some of this you’ve already dealt with, like using key values for your created ele- ments and using them in conjunction with exit and update behavior. But to make our pie pieces transition in a smooth graphical manner, we need to extend our transitions to include a custom tween to define how an arc can grow or shrink graphically into a different arc. pieChart.value(function(d) { return d.numRetweets; }); d3.selectAll("path").data(pieChart(nestedTweets.filter(function(d) { return d.numRetweets > 0; })), function (d) { return d.data.key; } ) .exit() .remove(); d3.selectAll("path").data(pieChart(nestedTweets.filter(function(d) { return d.numRetweets > 0; })), function (d) { return d.data.key} ) Listing 5.2 Updated binding and transitioning for pie layout Updates the function that defines the value for which we’re drawing arcs Binds only the objects that have values, instead of the entire array User id becomes our key value; this same key value needs to be used in the initial enter() behavior Removes the elements that have no corresponding data 148 CHAPTER 5 Layouts .transition() .duration(1000) .attrTween("d", arcTween); function arcTween(a) { var i = d3.interpolate(this._current, a); this._current = i(0); return function(t) { return newArc(i(t)); }; } The result of the code in listing 5.2 is a pie chart that cleanly transitions the individual arcs or removes them when no data corresponds to the pie pieces. You’ll see more of attrTween and styleTween, as well as a deeper investigation of easing and other tran- sition properties, in later chapters. We could label each pie piece element, color it according to a measure- ment or category, or add interactivity. But rather than spend a chapter creating the greatest pie chart application you’ve ever seen, we’ll move on to another kind of lay- out that’s often used: the circle pack. 5.3 Pack layouts Hierarchical data is amenable to an entire family of layouts. One of the most popular is circle packing, shown in figure 5.9. Each object is placed graphically inside the Calls a tween on the d attribute Uses the arc generator to tween the arc by calculating the shape of the arc explicitly Figure 5.9 Pack layouts are useful for representing nested data. They can be flattened (top), or they can visually represent hierarchy (bottom). (Examples from Bostock, https://github.com/mbostock/ d3/wiki/Pack-Layout.) 149Pack layouts hierarchical parent of that object. You can see the hierarchical relationship. As with all layouts, the pack layout expects a default representation of data that may not align with the data you’re working with. Specifically, pack expects a JSON object array where the child elements in a hierarchy are stored in a children attribute that points to an array. In examples of layout implementations on the web, the data is typically formatted to match the expected data format. In our case, we would format our tweets like this: {id: "All Tweets", children: [ {id: "Al’s Tweets", children: [{id: "tweet1"}, {id: "tweet2"}]}, {id: "Roy’s Tweets", children: [{id: "tweet1"}, {id: "tweet2"}]} ... But it’s better to get accustomed to adjusting the accessor functions of the layout to match our data. This doesn’t mean we don’t have to do any data formatting. We still need to create a root node for circle packing to work (what’s referred to as “All Tweets” in the previous code). But we’ll adjust the accessor function .children() to match the structure of the data as it’s represented in nestedTweets, which stores the child elements in the values attribute. In the following listing, we also override the .value() setting that determines the size of circles and set it to a fixed value, as shown in figure 5.10. var nestedTweets = d3.nest().key(function (el) { return el.user; }).entries(incData); var packableTweets = {id: "All Tweets", values: nestedTweets}; Listing 5.3 Circle packing of nested tweets data A B C Figure 5.10 Each tweet is represented by a green circle (A) nested inside an orange circle (B) that represents the user who made the tweet. The users are all nested inside a blue circle (C) that represents our “root” node. Puts the array that d3.nest creates inside a "root" object that acts as the top-level parent 150 CHAPTER 5 Layouts var depthScale = d3.scale.category10([0,1,2]); var packChart = d3.layout.pack(); packChart.size([500,500]) .children(function(d) { return d.values; }) .value(function(d) { return 1; }); d3.select("svg") .selectAll("circle") .data(packChart(packableTweets)) .enter() .append("circle") .attr("r", function(d) {return d.r;}) .attr("cx", function(d) {return d.x;}) .attr("cy", function(d) {return d.y;}) .style("fill", function(d) {return depthScale(d.depth);}) .style("stroke", "black") .style("stroke", "2px"); Notice that when the pack layout has a single child (as in the case of Sam, who only made one tweet), the size of the child node is the same as the size of the parent. This can visually seem like Sam is at the same hierarchical level as the other Twitter users who made more tweets. To correct this, we can modify the radius of the circle. That accounts for its depth in the hierarchy, which can act as a margin of sorts: .attr("r", function(d) {return d.r - (d.depth * 10)}) Creates a color scale to color each depth of the circle pack differently Sets the size of the circle-packing chart to the size of our canvas Sets the pack accessor function for child elements to look for "values", which matches the data created by d3.nest Creates a function that returns 1 when determining the size of leaf nodes Binds the results of packChart transforming packableTweets Radius and xy coordinates are all computed by the pack layout Gives each node a depth attribute that we can use to color them distinctly by depth Figure 5.11 An example of a fixed margin based on hierarchical depth. We can create this by reducing the circle size of each node based on its computed “depth” value. 151Pack layouts If you want to implement margins like those shown in figure 5.11 in the real world, you should use something more sophisticated than just the depth times 10. That scales poorly with a hierarchical dataset with many levels or with a crowded circle-packing lay- out. If there were one or two more levels in this hierarchy, our fixed margin would result in negative radius values for the circles, so we should use a d3.scale.linear() or other method to set the margin. You can also use the pack layout’s built-in .padding() func- tion to adjust the spacing between circles at the same hierarchical level. I glossed over the .value() setting on the pack layout earlier. If you have some numerical measurement for your leaf nodes, then you can use that measurement to set their size using .value() and therefore influence the size of their parent nodes. In our case, we can base the size of our leaf nodes (tweets) on the number of favorites and retweets each has received (the same value we used in chapter 4 as our “impact factor”). The results in figure 5.12 reflect this new setting. .value(function(d) {return d.retweets.length + d.favorites.length + 1}) Layouts, like generators and components, are amenable to method chaining. You’ll see examples where the settings and data are all strung together in long chains. As with the pie chart, you could assign interactivity to the nodes or adjust the colors, but this chapter focuses on the general structure of layouts. Notice that circle packing is quite similar to another hierarchical layout known as treemaps. Treemaps pack space more effectively because they’re built out of rectangles, but they can be harder to read. The next layout is another hierarchical layout, known as a dendrogram, that more explicitly draws the hierarchical connections in your data. Figure 5.12 A circle-packing layout with the size of the leaf nodes set to the impact factor of those nodes Adds 1 so that tweets with no retweets or favorites still have a value greater than zero and are displayed 152 CHAPTER 5 Layouts 5.4 Trees Another way to show hierarchical data is to lay it out like a family tree, with the parent nodes connected to the child nodes in a dendrogram (figure 5.13). The prefix dendro means “tree,” and in D3 the layout is d3.layout.tree. It follows much the same setup as the pack layout, except that to draw the lines connecting the Figure 5.13 Tree layouts are another useful method for expressing hierarchical relationships and are often laid out vertically (top), horizontally (middle), or radially (bottom). (Examples from Bostock.) 153Trees nodes, we need a new generator, d3.svg.diagonal, which draws a curved line from one point to another. var treeChart = d3.layout.tree(); treeChart.size([500,500]) .children(function(d) {return d.values}); var linkGenerator = d3.svg.diagonal(); d3.select("svg") .append("g") .attr("id", "treeG") .selectAll("g") .data(treeChart(packableTweets)) .enter() .append("g") .attr("class", "node") .attr("transform", function(d) { return "translate(" +d.x+","+d.y+")" }); d3.selectAll("g.node") .append("circle") .attr("r", 10) .style("fill", function(d) {return depthScale(d.depth)}) .style("stroke", "white") .style("stroke-width", "2px"); d3.selectAll("g.node") .append("text") .text(function(d) {return d.id || d.key || d.content}) d3.select("#treeG").selectAll("path") .data(treeChart.links(treeChart(packableTweets))) .enter().insert("path","g") .attr("d", linkGenerator) .style("fill", "none") .style("stroke", "black") .style("stroke-width", "2px"); Our dendrogram in figure 5.14 is a bit hard to read. To turn it on its side, we need to adjust the positioning of the elements by flipping the x and y coordinates, which orients the nodes horizontally. We also need to adjust the .projection() of the diag- onal generator, which orients the lines horizontally: linkGenerator.projection(function (d) {return [d.y, d.x]}) ... .append("g") ... .attr("transform", function(d) {return "translate(" +d.y+","+d.x+")"}); Listing 5.4 Callback function to draw a dendrogram Creates a diagonal generator with the default settings Creates a parent to put all these elements in This time we’ll create elements so we can label them. Uses packableTweets and depthScale from the previous example Like the pack layout, the tree layout computes the XY coordinates of each node. A little circle representing each node that we color with the same scale we used for the circle pack A text label for each node, with the text being either the id, key, or content attribute, whichever the node has The .links function of the layout creates an array of links between each node that we can use to draw these links. Just like all the other generators 154 CHAPTER 5 Layouts The result, shown in figure 5.15, is more legible because the text isn’t overlapping on the bottom of the canvas. But critical aspects of the chart are still drawn off the canvas. We only see half of the root node and the leaf nodes (the blue and green circles) and can’t read any of the labels of the leaf nodes, which represent our tweets. Figure 5.14 A dendrogram laid out vertically using data from tweets.json. The level 0 “root” node (which we created to contain the users) is in blue, the level 1 nodes (which represent users) are in orange, and the level 2 “leaf” nodes (which represent tweets) are in green. Figure 5.15 The same dendrogram as figure 5.14 but laid out horizontally. 155Trees We could try to create margins along the height and width of the layout as we did ear- lier. Or we could provide information about each node as a information box that opens when we click it, as with the soccer data. But a better option is to give the user the ability to drag the canvas up and down and left and right to see more of the visualization. To do this, we use the D3 zoom behavior, d3.behavior.zoom, which creates a set of event listeners. A behavior is like a component, but instead of creating graphical objects, it creates events (in this case for drag, mousewheel, and double-click) and ties those events to the element that calls the behavior. With each of these events, a zoom object changes its .translate() and/or .scale() values to correspond to the tradi- tional dragging and zooming interaction. You’ll use these changed values to adjust the position of graphical elements in response to user interaction. Like a component, the zoom behavior needs to be called by the element to which you want these events attached. Typically, you call the zoom from the base element, because then it fires whenever you click anything in your graphical area. When creating the zoom component, you need to define what functions are called on zoomstart, zoom, and zoom- end, which correspond (as you might imagine) to the beginning of a zoom event, the event itself, and the end of the event, respectively. Because zoom fires continuously as a user drags the mouse, you may want resource-intensive functions only at the begin- ning or end of the zoom event. You’ll see more complicated zoom strategies, as well as the use of scale, in chapter 7 when we look at geospatial mapping, which uses zoom- ing extensively. As with other components, to start a zoom component you create a new instance and set any attributes of it you may need. In our case, we only want the default zoom component, with the zoom event triggering a new function, zoomed(). This function changes the position of the element that holds our chart and allows the user to drag it around: treeZoom = d3.behavior.zoom(); treeZoom.on("zoom", zoomed); d3.select("svg").call(treeZoom); function zoomed() { var zoomTranslate = treeZoom.translate(); d3.select("g.treeG").attr("transform", "translate("+zoomTranslate[0]+","+zoomTranslate[1]+")") }; Now we can drag and pan our entire chart left and right and up and down. In figure 5.16, we can finally read the text of the tweets by dragging the chart to the left. The ability to zoom and pan gives you powerful interactivity to enhance your charts. It may seem odd that you learned how to use something called zoom and haven’t even dealt with Creates a new zoom component Keys the "zoom" event to the zoomed() function Calls our zoom component with the SVG canvas Transform attribute changes to reflect the zoom behavior Updating the to set it to the same translate setting of the zoom component updates the position of the and all its child elements. 156 CHAPTER 5 Layouts zooming in and out, but panning tends to be more universally useful with charts like these, while changing scale becomes a necessity when dealing with maps. We have other choices besides drawing our tree from top to bottom and left to right. If we tie the position of each node to an angle, and use a diagonal generator subclass created for radial layouts, we can draw our tree diagrams in a radial pattern: var linkGenerator = d3.svg.diagonal.radial() .projection(function(d) { return [d.y, d.x / 180 * Math.PI]; }); To make this work well, we need to reduce the size of our chart, because the radial drawing of a tree layout in D3 uses the size to determine the maximum radius, and is drawn out from the 0,0 point of its container like a element: treeChart.size([200,200]) With these changes in place, we need only change the positioning of the nodes to take rotation into account: .attr("transform", function(d) { return "rotate(" + (d.x - 90) + ")translate(" + d.y + ")"; }) Figure 5.17 shows the results of these changes. The dendrogram is a generic way of displaying information. It can be repurposed for menus or information you may not think of as traditionally hierarchical. One example (figure 5.18) is from the work of Jason Davies, who used the dendrogram functionality in D3 to create word trees. Figure 5.16 The dendrogram, when dragged to the left, shows the labels for the tweets. 157Trees Figure 5.17 The same dendrogram laid out in a radial manner. Notice that the elements are rotated, so their child elements are rotated in the same manner. Figure 5.18 Example of using a dendrogram in a word tree by Jason Davies (http://www.jasondavies.com/wordtree/). 158 CHAPTER 5 Layouts Hierarchical layouts are common and well understood by readers. This gives you the option to emphasize the nested container nature of a hierarchy, as we did with the circle pack layout, or the links between parent and child elements, as with the dendrogram. 5.5 Stack layout You saw the effects of the stack layout in the last chapter when we created a stream- graph, an example of which is shown in figure 5.19. We began with a simple stacking function and then made it more complex. As I pointed out then, D3 actually imple- ments a stack layout, which formats your data so that it can be easily passed to d3.svg.area to draw a stacked graph or streamgraph. To implement this, we’ll use the area generator in tandem with the stack layout in list- ing 5.5. This general pattern should be familiar to you by now: 1 Process the data to match the requirements of the layout. 2 Set the accessor functions of the layout to align it with the dataset. 3 Use the layout to format the data for display. 4 Send the modified data either directly to SVG elements or paired with a genera- tor like d3.svg.diagonal, d3.svg.arc, or d3.svg.area. The first step is to take our original streamdata.csv data and transform it into an array of movies objects that each have an array of values at points that correspond to the thickness of the section of the streamgraph that they represent. d3.csv("movies.csv", function(error,data) {dataViz(data)}); function dataViz(incData) { expData = incData; stackData = []; Listing 5.5 Stack layout example Figure 5.19 The streamgraph used in a New York Times piece on movie grosses (figure from The New York Times, February 23, 2008; http://mng.bz/rV7M) 159Stack layout var xScale = d3.scale.linear().domain([0, 10]).range([0, 500]); var yScale = d3.scale.linear().domain([0, 100]).range([500, 0]); var movieColors = d3.scale .category10(["movie1","movie2","movie3","movie4","movie5","movie6"]); var stackArea = d3.svg.area() .interpolate("basis") .x(function(d) { return xScale(d.x); }) .y0(function(d) { return yScale(d.y0); }) .y1(function(d) { return yScale(d.y0 + d.y); }); for (x in incData[0]) { if (x != "day") { var newMovieObject = {name: x, values: []}; for (y in incData) { newMovieObject.values.push({ x: parseInt(incData[y]["day"]) , y: parseInt(incData[y][x]) }); }; stackData.push(newMovieObject); }; }; stackLayout = d3.layout.stack() .offset("silhouette") .order("inside-out") .values(function(d) { return d.values; }); d3.select("svg").selectAll("path") .data(stackLayout(stackData)) .enter().append("path") .style("fill", function(d) {return movieColors(d.name);}) .attr("d", function(d) { return stackArea(d.values); }); }; After the initial dataset is reformatted, the data in the object array is structured so that the stack layout can deal with it: [ {"name":"movie1","values":[{"x":1,"y":20},{"x":2,"y":18},{"x":3,"y":14},{"x": 4,"y":7},{"x":5,"y":4},{"x":6,"y":3},{"x":7,"y":2},{"x":8,"y":0},{"x":9, "y":0},{"x":10,"y":0}]}, {"name":"movie2","values":[{"x":1,"y":8},{"x":2,"y":5},{"x":3,"y":3},{"x":4," y":3},{"x":5,"y":3},{"x":6,"y":1},{"x":7,"y":0},{"x":8,"y":0},{"x":9,"y" :0},{"x":10,"y":0}]} ... The x value is the day, and the y value is the amount of money made by the movie that day, which corresponds to thickness. As with other layouts, if we didn’t format our data this way, we’d need to adjust the .x() and .y() accessors to match our data names for those values. One of the benefits of formatting our data to match the expected data model of the layout is that the layout function is very simple: stackLayout = d3.layout.stack() .values(function(d) { return d.values; }); We want to skip the day column, because, in this case, the day becomes our x value. For each movie, we create an object with an empty array named "values". Fill the "values" array with objects that list the x coordinate as the day and the y coordinate as the amount of money made by a movie on that day. Function chains on the newly created stack() layout function 160 CHAPTER 5 Layouts After our stackLayout function processes our dataset, we can get the results by run- ning stackLayout(stackData). The layout creates x, y, and y0 functions correspond- ing to the top and bottom of the object at the x position. If we use the stack layout to create a streamgraph, then it requires a corresponding area generator: var stackArea = d3.svg.area() .x(function(d) { return xScale(d.x); }) .y0(function(d) { return yScale(d.y0); }) .y1(function(d) { return yScale(d.y0 + d.y); }); After we have our data, layout, and area generator in order, we can call them all as part of the selection and binding process. This gives a set of SVG elements the necessary shapes to make our chart: d3.select("svg").selectAll("path") .data(stackLayout(stackData)) .enter() .append("path") .style("fill", function(d) {return movieColors(d.name);}) .attr("d", function(d) { return stackArea(d.values); }); The result, as shown in figure 5.20, isn’t a streamgraph but rather a stacked area chart, which isn’t that different from a streamgraph, as you’ll soon find out. The stack layout has an .offset() function that determines the relative positions of the areas that make up the chart. Although we can write our own offset functions to create exotic charts, this function recognizes a few keywords that achieve the typical effects we’re looking for. We’ll use the silhouette keyword, which centers the drawing Usually at some point you need to pass the data to a scale function to fit it to the screen. The data being bound is stackData processed by stackLayout(). A color scale that associates a unique color with each object in the array The area generator takes the values from our data processed by the layout to get the SVG drawing code. Figure 5.20 The stack layout default settings, when tied to an area generator, produce a stacked area chart like this one. 161Stack layout of the stacked areas around the middle. Another function useful for creating stream- graphs is the .order() function of a stack layout, which determines the order in which areas are drawn, so that you can alternate them like in a streamgraph. We’ll use inside-out because that produces the best streamgraph effect. The last change is to the area constructor, which we’ll update to use the basis interpolator because that gave the best look in our earlier streamgraph example: stackLayout.offset("silhouette").order("inside-out"); stackArea.interpolator("basis"); This results in a cleaner streamgraph than our example from chapter 4, and is shown in figure 5.21. The last time we made a streamgraph, we explored the question of whether it was a useful chart. It is useful, for various reasons, not least of which is because the area in the chart corresponds graphically to the aggregate profit of each movie. But sometimes a simple stacked bar graph is better. Layouts can be used for various types of charts, and the stack layout is no different. If we restore the .offset() and .order() back to the default settings, we can use the stack layout to create a set of rectangles that makes a traditional stacked bar chart: stackLayout = d3.layout.stack() .values(function(d) { return d.values; }); var heightScale = d3.scale.linear() .domain([0, 70]) .range([0, 480]); d3.select("svg").selectAll("g.bar") .data(stackLayout(stackData)) .enter() .append("g") Figure 5.21 The streamgraph effect from a stack layout with basis interpolation for the areas and using the silhouette and inside-out settings for the stack layout. This is similar to our hand-built example from chapter 4 and shows the same graphical artifacts from the basis interpolation. 162 CHAPTER 5 Layouts .attr("class", "bar") .each(function(d) { d3.select(this).selectAll("rect") .data(d.values) .enter() .append("rect") .attr("x", function(p) { return xScale(p.x) - 15; }) .attr("y", function(p) { return yScale(p.y + p.y0); }) .attr("height", function(p) { return heightScale(p.y); }) .attr("width", 30) .style("fill", movieColors(d.name)); }); In many ways, the stacked bar chart in figure 5.22 is much more readable than the streamgraph. It presents the same information, but the y-axis tells us exactly how much money a movie made. There’s a reason why bar charts, line charts, and pie charts are the standard chart types found in your spreadsheet. Streamgraph, stacked bar charts, and stacked area charts are fundamentally the same thing, and rely on the stack layout to format your dataset to draw it. Because you can deploy them equally easily, your decision whether to use one or the other can be based on user testing rather than your ability to create awesome dataviz. The layouts we’ve looked at so far, as well as the associated methods and generators, have broad applicability. Now we’ll look at a pair of layouts that don’t come with D3 that are designed for more specific kinds of data: the Sankey diagram and the word cloud. Even though these layouts aren’t as generic as the layouts included in the core D3 library that we’ve looked at, they have some prominent examples and can come in handy. Figure 5.22 A stacked bar chart using the stack layout to determine the position of the rectangles that make up each day’s stacked bar 163Plugins to add new layouts 5.6 Plugins to add new layouts The examples we’ve touched on in this chapter are a few of the layouts that come with the core D3 library. You’ll see a few more in later chapters, and we’ll focus specifically on the force layout in chapter 6. But layouts outside of core D3 may also be useful to you. These layouts tend to use specifically formatted datasets or different terminology for layout functions. 5.6.1 Sankey diagram The Sankey diagram provides you with the ability to map flow from one category to another. It’s the kind of diagram used in Google Analytics (figure 5.23) to show event flow or user flow from one part of your website to another. Sankey diagrams consist of two types of objects: nodes and edges. In this case, the nodes are the web pages or events, and the edges are the traffic between them. This differs from the hierarchical data you worked with before, because nodes can have many overlap- ping connections. The D3 version of the Sankey layout is a plugin written by Mike Bostock a couple of years ago, and you can find it at https://github.com/d3/d3-plugins along with other interesting D3 plugins. The Sankey layout has a couple of examples and sparse documentation—one of the drawbacks of noncore layouts. Another minor draw- back is that they don’t always follow the patterns of the core layouts in D3. To under- stand the Sankey layout, you need to examine the format of the data, the examples, and the code itself. Figure 5.23 Google Analytics uses Sankey diagrams to chart event and user flow for website visitors. 164 CHAPTER 5 Layouts D3 PLUGINS The core d3.js library that you download comes with quite a few lay- outs and useful functions, but you can find even more at https://github.com/ d3/d3-plugins. Besides the two noncore layouts discussed in this chapter, we’ll look at the geo plugins in chapter 7 when we deal with maps. Also available is a fish- eye distortion lens, a canned boxplot lay- out, a layout for horizon charts, and more exotic plugins for Chernoff faces and implementing the superformula. The data is a JSON array of nodes and a second JSON array of links. Get used to this format, because it’s the format of most of the network data we’ll use in chapter 6. For our example, we’ll look at the traffic flow in a website that sells milk and milk-based products. We want to see how visitors move through the site from the homepage to the store page to the various product pages. In the parlance of the data format we need to work with, the nodes are the web pages, the links are the visitors who go from one page to another (if any), and the value of each link is the total number of visitors who move from that page to the next. { "nodes":[ {"name":"index"}, {"name":"about"}, {"name":"contact"}, {"name":"store"}, {"name":"cheese"}, {"name":"yoghurt"}, {"name":"milk"} ], "links":[ {"source":0,"target":1,"value":25}, {"source":0,"target":2,"value":10}, {"source":0,"target":3,"value":40}, {"source":1,"target":2,"value":10}, {"source":3,"target":4,"value":25}, {"source":3,"target":5,"value":10}, {"source":3,"target":6,"value":5}, {"source":4,"target":6,"value":5}, {"source":4,"target":5,"value":15} ] } The nodes array is clear—each object represents a web page. The links array is a bit more opaque, until you realize the numbers represent the array position of nodes in the node array. So when links[0] reads "source": 0, it means that the source is Listing 5.6 sitestats.json Each entry in this array represents a web page. Each entry in this array represents the number of times someone navigated from the "source" page to the "target" page. 165Plugins to add new layouts nodes[0], which is the index page of the site. It connects to nodes[1], the about page, and indicates that 25 people navigated from the home page to the about page. That defines our flow—the flow of traffic through a site. The Sankey layout is initialized like any layout: var sankey = d3.sankey() .nodeWidth(20) .nodePadding(200) .size([460, 460]) .nodes(data.nodes) .links(data.links) .layout(200); Until now, you’ve only seen .size(). It controls the graphical extent that the layout uses. The rest you’d need to figure out by looking at the example, experimenting with different values, or reading the sankey.js code itself. Most of it will quickly make sense, especially if you’re familiar with the .nodes() and .links() convention used in D3 network visualizations. The .layout() setting is pretty hard to understand without div- ing into the code, but I’ll explain that next. After we define our Sankey layout as in listing 5.7, we need to draw the chart by selecting and binding the necessary SVG elements. In this case, that typically consists of elements for the nodes and elements for the flows. We’ll also add elements to label the nodes. var intensityRamp = d3.scale.linear() .domain([0,d3.max(data.links, function(d) { return d.value; }) ]) .range(["black", "red"]); d3.select("svg").append("g") .attr("transform", "translate(20,20)").attr("id", "sankeyG"); d3.select("#sankeyG").selectAll(".link") .data(data.links) .enter().append("path") .attr("class", "link") .attr("d", sankey.link()) .style("stroke-width", function(d) { return d.dy; }) .style("stroke-opacity", .5) .style("fill", "none") .style("stroke", function(d){ return intensityRamp(d.value); }) .sort(function(a, b) { return b.dy - a.dy; }) .on("mouseover", function() { d3.select(this).style("stroke-opacity", .8); }) .on("mouseout", function() { d3.selectAll("path.link").style("stroke-opacity", .5) }); Listing 5.7 Sankey drawing code Where to start and stop drawing the flows between nodes The distance between nodes vertically; a lower value creates longer bars representing our web pages The number of times to run the layout to optimize placement of flows Offsets the parent of the entire chart Sankey layout’s .link() function is a path generator Note that layout expects us to use a thick stroke and not a filled area. Sets the stroke color using our intensity ramp, black to red indicating weak to strong Emphasizes the link when we mouse over it by making it less transparent 166 CHAPTER 5 Layouts d3.select("#sankeyG").selectAll(".node") .data(data.nodes) .enter().append("g") .attr("class", "node") .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; }); d3.selectAll(".node").append("rect") .attr("height", function(d) { return d.dy; }) .attr("width", 20) .style("fill", "pink") .style("stroke", "gray"); d3.selectAll(".node").append("text") .attr("x", 0) .attr("y", function(d) { return d.dy / 2; }) .attr("text-anchor", "middle") .text(function(d) { return d.name; }); The implementation of this layout has some interactivity, as shown in figure 5.24. Dia- grams like these, with wavy paths overlapping other wavy paths, need interaction to make them legible to your site visitor. In this case, it differentiates one flow from another. With a Sankey diagram like this at your disposal, you can track the flow of goods, visitors, or anything else through your organization, website, or other system. Although you could expand on this example in any number of ways, I think one of the most useful is also one of the simplest. Remember, layouts aren’t tied to particular shape elements. In some cases, like with the flows in the Sankey diagram, you’ll have a Calculates node position as x and y coordinates on our data Figure 5.24 A Sankey diagram where the number of visitors is represented in the color of the path. The flow between index and contact has an increased opacity as the result of a mouseover event. 167Plugins to add new layouts hard time adapting the layout data to any element other than a , but the nodes don’t need to be elements. If we adjust our code, we can easily make nodes that are circles: sankey.nodeWidth(1); d3.selectAll(".node").append("circle") .attr("height", function(d) { return d.dy; }) .attr("r", function(d) { return d.dy / 2; }) .attr("cy", function(d) { return d.dy / 2; }) .style("fill", "pink") .style("stroke", "gray"); Don’t shy away from experimenting with tweaks to traditional charting methods. Using circles instead of rectangles, like in figure 5.25, may seem frivolous, but it may be a better fit visually, or it may distinguish your Sankey from all the boring sharp- edged Sankeys out there. In the same vein, don’t be afraid of leveraging D3’s capac- ity for information visualization to teach yourself how a layout works. You’ll remem- ber that d3.layout.sankey has a layout() function, and you might discover the operation of that function by reading the code. But there’s another way for you to see how this function works: by using transitions and creating a function that updates the .layout() property dynamically, you can see what this function does to the chart graphically. VISUALIZING ALGORITHMS Although you may think of data visualization as all the graphics in this book, it’s also simultaneously a graphical representation Figure 5.25 A squid-like Sankey diagram 168 CHAPTER 5 Layouts of the methods you used to process the data. In some cases, like the Sankey diagram here or the force-directed network visualization you’ll see in the next chapter, the algorithm used to sort and arrange the graphical elements is front and center. After you have a layout that displays properly, you can play with the settings and update the elements like you’ve done with the Sankey diagram to better understand how the algorithm works visually. First we need to add an onclick function to make the chart interactive, as shown in listing 5.8. We’ll attach this function to the element itself, but you could just as easily add a button like we did in chapter 3. The moreLayouts() function does two things. It updates the sankey.layout() property by incrementing a variable and setting it to the new value of that variable. It also selects the graphical elements that make up your chart (the and ele- ments) and redraws them with the updated settings. By using transition() and delay(), you’ll see the chart dynamically adjust. var numLayouts = 1; d3.select("svg").on("click", moreLayouts); sankey.layout(numLayouts); function moreLayouts() { numLayouts += 20; sankey.layout(numLayouts); d3.selectAll(".link") .transition() .duration(500) .attr("d", sankey.link()) d3.selectAll(".node") .transition() .duration(500) .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; }); } Listing 5.8 Visual layout function for the Sankey diagram Figure 5.26 The Sankey layout algorithm attempts to optimize the positioning of nodes to reduce overlap. The chart reflects the position of nodes after (from left to right) 1 pass, 20 passes, 40 passes, and 200 passes. Initializes the sankey with only a single layout pass We choose 20 passes because it shows some change without requiring us to click too much. Because the layout updates the dataset, we just have to call the drawing functions again and they automatically update. 169Plugins to add new layouts The end result is a visual experience of the effect of the .layout() function. This function specifies the number of passes that d3.layout.sankey makes to determine the best position of the lines representing flow. You can see some snapshots of this in figure 5.26 showing the lines sort out and get out of each other’s way. This kind of position optimization is a common technique in information visualization, and drives the force-directed network layout that you’ll see in chapter 6. In the case of our San- key example, even one pass of the layout provides good positioning. That’s because this is a simple dataset, and it stabilizes quickly. As you can see as you click your chart and in figure 5.26, the layout doesn’t change much with progressively higher numbers of passes in the layout() setting. It should be clear by this example that when you update the settings of the layout, you can also update the visual display of the layout. You can use animations and transi- tions by simply calling the elements and setting their drawing code or position to reflect the changed data. You’ll see much more of this in later chapters. 5.6.2 Word clouds One of the most popular information visualization charts is also one of the most maligned: the word cloud. Also known as a tag cloud, the word cloud uses text and text size to represent the importance or frequency of words. Figure 5.27 shows a Figure 5.27 A word or tag cloud uses the size of a word to indicate its importance or frequency in a text, creating a visual summary of text. These word clouds were created by the popular online word cloud generator Wordle (www.wordle.net). 170 CHAPTER 5 Layouts thumbnail gallery of 15 word clouds derived from text in a species biodiversity data- base. Oftentimes, word clouds rotate the words to set them at right angles or jumble them at random angles to improve the appearance of the graphics. Word clouds, like streamgraphs, receive criticism for being hard to read or presenting too little informa- tion. But both are surprisingly popular with audiences. I created these word clouds using my data with the popular Java applet Wordle, which provides an easy UI and a few aesthetic customization choices. Wordle has flooded the internet with word clouds because it lets anyone create visually arresting but problematic graphics by dropping text onto a page. This caused much consterna- tion among data visualization experts, who think word clouds are evil because they embed no analysis in the visualization and only highlight superficial data such as the quantity of words in a blog post. But word clouds aren’t evil. First of all, they’re popular with audiences. But more than that, words are remarkably effective graphical objects. If you can identify a numeri- cal attribute that indicates the significance of a word, then scaling the size of a word in a word cloud relays that significance to your reader. So let’s start by assuming we have the right kind of data for a word cloud. Fortu- nately, we do: the top twenty words used in this chapter, with the number of each word. text,frequency layout,63 function,61 data,47 return,36 attr,29 chart,28 array,24 style,24 layouts,22 values,22 need,21 nodes,21 pie,21 use,21 figure,20 circle,19 we'll,19 zoom,19 append,17 elements,17 To create a word cloud with D3, you have to use another layout that isn’t in the core library, created by Jason Davies (who created the sentence trees using the tree layout shown in figure 5.17). You’ll also need to implement an algorithm written by Jonathan Feinberg (http://static.mrfeinberg.com/bv_ch03.pdf). The layout, d3.layout.cloud(), is available on GitHub at https://github.com/jasondavies/d3-cloud. It requires that Listing 5.9 worddata.csv 171Plugins to add new layouts you define what attribute will determine word size and what size you want the word cloud to lay out for. Unlike most other layouts, cloud() fires a custom event "end" that indicates it’s done calculating the most efficient use of space to generate the word cloud. The lay- out then passes to this event the processed dataset with the position, rotation, and size of the words. We can then run the cloud layout without ever referring to it again, and we don’t even need to assign it to a variable, as we do in the following listing. If we plan to reuse the cloud layout and adjust the settings, we assign it to a variable like with any other layout. var wordScale=d3.scale.linear().domain([0,75]).range([10,160]); d3.layout.cloud() .size([500, 500]) .words(data) .fontSize(function(d) { return wordScale(d.frequency); }) .on("end", draw) .start(); function draw(words) { var wordG = d3.select("svg").append("g") .attr("id", "wordCloudG") .attr("transform","translate(250,250)"); wordG.selectAll("text") .data(words) .enter() .append("text") .style("font-size", function(d) { return d.size + "px"; }) .style("opacity", .75) .attr("text-anchor", "middle") .attr("transform", function(d) { return "translate(" + [d.x, d.y] + ")rotate(" + d.rotate + ")"; }) .text(function(d) { return d.text; }); }; This code creates an SVG element that’s rotated and placed according to the code. None of our words are rotated, so we get the staid word cloud shown in figure 5.28. It’s simple enough to define rotation, and we only need to set some rotation value in the cloud layout’s .rotate() function: randomRotate=d3.scale.linear().domain([0,1]).range([-20,20]); d3.layout.cloud() .size([500, 500]) .words(data) .rotate(function() {return randomRotate(Math.random())} ) Listing 5.10 Creating a word cloud with d3.layout.cloud Uses a scale rather than raw values for the font Assigns data to the cloud layout using .words() Sets the size of each word using our scale The cloud layout needs to be initialized; when it’s done it fires "end" and runs whatever function "end" is associated with. We’ve assigned draw() to "end", which automatically passes the processed dataset as the words variable. Translation and rotation are calculated by the cloud layout. This scale takes a random number between 0 and 1 and returns an angle between -20 degrees and 20 degrees. Sets the rotation for each word 172 CHAPTER 5 Layouts .fontSize(function(d) { return wordScale(d.frequency); }) .on("end", draw) .start(); At this point, we have your traditional word cloud (figure 5.29), and we can tweak the settings and colors to create anything you’ve seen on Wordle. But now let’s take a look at why word clouds get such a bad reputation. We’ve taken an interesting dataset, the most common words in this chapter, and, other than size them by their frequency, done little more than place them on screen and jostle them a bit. We have different channels for expressing data visually, and in this case the best channels that we have, besides size, are color and rotation. With that in mind, let’s imagine that we have a keyword list for this book, and that each of these words is in a glossary in the back of the book. We’ll place those keywords Figure 5.28 A word cloud with words that are arranged horizontally Figure 5.29 A word cloud using the same worddata.csv but with words slightly perturbed by randomizing the rotation property of each word 173Plugins to add new layouts in an array and use them to highlight the words in our word cloud that appear in the glossary. The code in the following listing also rotates shorter words 90 degrees and leaves the longer words unrotated so that they’ll be easier to read. var keywords = ["layout", "zoom", "circle", "style", "append", "attr"] d3.layout.cloud() .size([500, 500]) .words(data) .rotate(function(d) { return d.text.length > 5 ? 0 : 90; }) .fontSize(function(d) { return wordScale(d.frequency); }) .on("end", draw) .start(); function draw(words) { var wordG = d3.select("svg").append("g") .attr("id", "wordCloudG").attr("transform","translate(250,250)"); wordG.selectAll("text") .data(words) .enter() .append("text") .style("font-size", function(d) { return d.size + "px"; }) .style("fill", function(d) { return (keywords.indexOf(d.text) > -1 ? "red" : "black"); }) .style("opacity", .75) .attr("text-anchor", "middle") .attr("transform", function(d) { return "translate(" + [d.x, d.y] + ") rotate(" + d.rotate + ")"; }) .text(function(d) { return d.text; }); }; Listing 5.11 Word cloud layout with key word highlighting Our array of keywords The rotate function rotates by 90 degrees every word with five or fewer characters. If the word appears in the keyword list, color it red; otherwise, color it black. Figure 5.30 This word cloud highlights keywords and places longer words horizontally and shorter words vertically. 174 CHAPTER 5 Layouts The word cloud in figure 5.30 is fundamentally the same, but instead of using color and rotation for aesthetics, we used them to encode information in the dataset. You can read about more controls over the format of your word cloud, including selecting fonts and padding, in the layout’s documentation at https://www.jasondavies.com/ wordcloud/about/. Layouts like the word cloud aren’t suitable for as wide a variety of data as some other layouts, but because they’re so easy to deploy and customize, you can combine them with other charts to represent the multiple facets of your data. You’ll see this kind of synchronized chart in chapter 9. 5.7 Summary In this chapter, we took an in-depth look at D3 layout structure and experimented with several datasets. In doing so, you learned how to use layouts not just to draw one particular chart, but also variations on that chart. You also experimented with interac- tivity and animation. In particular, we covered ■ Layout structure and functions common to D3 core layouts ■ Arc and diagonal generators for drawing arcs and connecting links ■ How to make pie charts and donut charts using the pie layout ■ Using tweens to better animate the graphical transition for arc segments (pie pieces) ■ How to create circle-packing diagrams and format them effectively using the pack layout ■ How to create vertical, horizontal, and radial dendrograms using the tree layout ■ How to create stacked area charts, streamgraphs, and stacked bar charts using the stack layout ■ How to use noncore D3 layouts to build Sankey diagrams and word clouds Now that you understand layouts in general, in the next chapter we’ll focus on how to represent networks. We’ll spend most of our time working with the force-directed lay- out, which has much in common with general layouts but is distinguished from them because it’s designed to be interactive and animated. Because the chapter deals with network data, like the kind you used for the Sankey layout in this chapter, you’ll also learn a few tips and tricks for processing and measuring networks. 175 Network visualization Network analysis and network visualization are more common now with the growth of online social networks like Twitter and Facebook, as well as social media and linked data in what was known as Web 2.0. Network visualizations like the kind you’ll see in this chapter, some of which are shown in figure 6.1, are particularly interesting because they focus on how things are related. They represent systems more accu- rately than the traditional flat data seen in more common data visualizations. This chapter focuses on representing networks, so it’s important that you under- stand network terminology. In general, when dealing with networks you refer to the things being connected (like people) as nodes and the connections between them (such as being a friend on Facebook) as edges or links. You may hear nodes referred to as vertices, because that’s where the edges join. Although it may seem useful to have a figure with nodes and edges labeled, one of the lessons from this chapter is that there is no one way to represent a network. Networks may also be referred to as This chapter covers ■ Creating adjacency matrices and arc diagrams ■ Using the force-directed layout ■ Representing directionality ■ Adding and removing network nodes and edges 176 CHAPTER 6 Network visualization graphs, because that’s what they’re called in mathematics. Finally, the importance of a node in a network is typically referred to as centrality. There’s more, but that should be enough to get you started. Networks aren’t just a data format; they’re a perspective on data. When you work with network data, you typically try to discover and display patterns of the network or of parts of the network, and not of individual nodes in the network. Although you may use a network visualization because it makes a cool graphical index, like a mind map or a network map of a website, in general you’ll find that the typical information visualization techniques are designed to showcase network structure, and not individ- ual nodes. 6.1 Static network diagrams Network data is different from hierarchical data. Networks present the possibility of many-to-many connections, like the Sankey layout from chapter 5, whereas in hierar- chical data a node can have many children but only one parent, like the tree and pack Figure 6.1 Along with explaining the basics of network analysis (section 6.2.3), this chapter includes laying out networks using xy positioning (section 6.2.5), force-directed algorithms (section 6.2), adjacency matrices (section 6.1.2), and arc diagrams (section 6.1.3). 177Static network diagrams layouts from chapter 5. A network doesn’t have to be a social network. This format can represent many different structures, such as transportation networks and linked open data. In this chapter we’ll look at four common forms for representing networks: as data, as adjacency matrices, as arc diagrams, and using force-directed network diagrams. In each case, the graphical representation will be quite different. For instance, in the case of a force-directed layout, we’ll represent the nodes as circles and the edges as lines. But in the case of the adjacency matrix, nodes will be positioned on x- and y-axes and the edges will be filled squares. Networks don’t have a default representa- tion, but the examples you’ll see in this chapter are the most common. 6.1.1 Network data Although you can store networks in several data formats, the most straightforward is known as the edge list. An edge list is typically represented as a CSV like that shown in listing 6.1, with a source column and a target column, and a string or number to indi- cate which nodes are connected. Each edge may also have other attributes, indicating the type of connection or its strength, the time period when the connection is valid, its color, or any other information you want to store about a connection. The impor- tant thing is that only the source and target columns are necessary. In the case of directed networks, the source and target columns indicate the direc- tion of connection between nodes. A directed network means that nodes may be con- nected in one direction but not in the other. For instance, you could follow a user on Twitter, but that doesn’t necessarily mean that the user follows you. Undirected net- works still typically have the columns listed as “source” and “target,” but the connec- tion is the same in both directions. Take the example of a network made up of connections indicating people have shared classes. Then if I’m in a class with you, you’re likewise in a class with me. You’ll see directed and weighted networks repre- sented throughout this chapter. source,target,weight sam,pris,1 roy,pris,5 roy,sam,1 tully,pris,5 tully,kim,3 tully,pat,1 tully,mo,3 kim,pat,2 kim,mo,1 mo,tully,7 mo,pat,1 mo,pris,1 pat,tully,1 pat,kim,2 pat,mo,5 lee,al,3 Listing 6.1 edgelist.csv 178 CHAPTER 6 Network visualization Our network also has a weight value for the connections, which indicates the strength of connections. In our case, our edge list represents how many times the source favorited the tweets of the target. Sam favorited one tweet made by Pris, and Roy favorited 5 tweets made by Pris, and so on. This is a weighted network because the edges have a value. It’s a directed network because the edges have direction. There- fore, we have a weighted directed network, and we need to account for both weight and direction in our network visualizations. Technically, you only need an edge list to create a network, because you can derive a list of nodes from the unique values in the edge list. This is done by tradi- tional network analysis software packages like Gephi. Although you can derive a node list with JavaScript, it’s more common to have a corresponding node list that provides more information about the nodes in your network, like we have in the fol- lowing listing. id,followers,following sam,17,500 roy,83,80 pris,904,15 tully,7,5 kim,11,50 mo,80,85 pat,150,300 lee,38,7 al,12,12 Because these are Twitter users, we have more information about them based on their Twitter stats, in this case, the number of followers and the number of people they fol- low. As with the edge list, it’s not necessary to have more than an ID. But having access to more data gives you the chance to modify your network visualization to reflect the node attributes. How you represent a network depends on its size and the nature of the network. If a network doesn’t represent discrete connections between similar things, but rather the flow of goods or information or traffic, then you could use a Sankey diagram like we did in chapter 5. Recall that the data format for the Sankey is exactly the same as what we have here: a table of nodes and a table of edges. The Sankey diagram is only suitable for specific kinds of network data. Other chart types, such as an adjacency matrix, are more generically useful for network data. Before we get started with code to create a network visualizations, let’s put together a CSS page so that we can set color based on class and use inline styles as little as possi- ble. Listing 6.3 gives the CSS necessary for all the examples in this chapter. Keep in mind that we’ll still need to set some inline styles when we want the numerical value of an attribute to relate to the data bound to that graphical element, for example, when we base the stroke-width of a line on the strength of that line. Listing 6.2 nodelist.csv 179Static network diagrams .grid { stroke: black; stroke-width: 1px; fill: red; } .arc { stroke: black; fill: none; } .node { fill: lightgray; stroke: black; stroke-width: 1px; } circle.active { fill: red; } path.active { stroke: red; } 6.1.2 Adjacency matrix As you see more and more networks represented graphically, it seems like the only way to represent a network is with a circle or square that represents the node and a line (whether straight or curvy) that represents the edge. It may surprise you that one of the most effective network visualizations has no connecting lines at all. Instead, the adjacency matrix uses a grid to represent connections between nodes. The principle of an adjacency matrix is simple: you place the nodes along the x-axis and then place the same nodes along the y-axis. If two nodes are connected, then the corresponding grid square is filled; otherwise, it’s left blank. In our case, because it’s a directed network, the nodes along the y-axis are considered the source and the nodes along the x-axis are considered the target, as you’ll see in a few pages. Because our network is also weighted, we’ll use saturation to indicate weight, with lighter colors indicating a weaker connection and darker colors indicating a stronger connection. The only problem with building an adjacency matrix in D3 is that it doesn’t have an existing layout, which means you have to build it by hand like we did with the bar chart, scatterplot, and boxplot. Mike Bostock has an impressive example at http:// bost.ocks.org/mike/miserables/, but you can make something that’s functional with- out too much code, which we’ll do with the function in listing 6.4. In doing so, though, we need to process the two JSON arrays that are created from our CSVs and format the data so that it’s easy to work with. This is close to writing our own layout, something we’ll do in chapter 10, and a good idea generally. Listing 6.3 networks.css 180 CHAPTER 6 Network visualization function adjacency() { queue() .defer(d3.csv, "nodelist.csv") .defer(d3.csv, "edgelist.csv") .await(function(error, file1, file2) { createAdjacencyMatrix(file1, file2); }); function createAdjacencyMatrix(nodes,edges) { var edgeHash = {}; for (x in edges) { var id = edges[x].source + "-" + edges[x].target; edgeHash[id] = edges[x]; }; matrix = []; for (a in nodes) { for (b in nodes) { var grid = {id: nodes[a].id + "-" + nodes[b].id, x: b, y: a, weight: 0}; if (edgeHash[grid.id]) { grid.weight = edgeHash[grid.id].weight; }; matrix.push(grid); }; }; d3.select("svg") .append("g") .attr("transform", "translate(50,50)") .attr("id", "adjacencyG") .selectAll("rect") .data(matrix) .enter() .append("rect") .attr("class", "grid") .attr("width", 25) .attr("height", 25) .attr("x", function (d) {return d.x * 25}) .attr("y", function (d) {return d.y * 25}) .style("fill-opacity", function (d) {return d.weight * .2;}) var scaleSize = nodes.length * 25; var nameScale = d3.scale.ordinal() .domain(nodes.map(function (el) {return el.id})) .rangePoints([0,scaleSize],1); var xAxis = d3.svg.axis() .scale(nameScale).orient("top").tickSize(4); var yAxis = d3.svg.axis() .scale(nameScale).orient("left").tickSize(4); d3.select("#adjacencyG").append("g").call(yAxis); d3.select("#adjacencyG").append("g").call(xAxis) .selectAll("text") Listing 6.4 The adjacency matrix function We need to load two datasets before we can get started, and queue lets us move the asynchronous loaders into a synchronous format. A hash allows us to test if a source-target pair has a link. Creates all possible source-target connections Sets the xy coordinates based on the source- target array positions If there’s a corresponding edge in our edge list, give it that weight. Creates an ordinal scale from the node IDs Used for ordinal values Both axes use the same scale. 181Static network diagrams .style("text-anchor", "end") .attr("transform", "translate(-10,-10) rotate(90)"); }; }; A few new things are going on here. For one, we’re using a new scale: d3.scale.ordinal, which takes an array of distinct values and allows us to place them on an axis like we do with the names of our nodes in this example. We need to use a scale function that you haven’t seen before, rangePoints, which creates a set of bins for each of our val- ues for display on an axis or otherwise. It does this by associating each of those unique values with a numerical position within the range given. Each point can also have an offset declared in the second, optional variable. The other new piece of code uses queue.js, which we need because we’re loading two CSV files and we don’t want to run our function until those two CSVs are loaded. We’re building this matrix array of objects that may seem obscure. But if you examine it in your console, you’ll see, as in figure 6.2, it’s just a list of every possible connection and the strength of that connec- tion, if it exists. Figure 6.3 shows the resulting adjacency matrix based on the node list and edge list. You’ll notice in many adjacency matrices that the square indicating the connec- tion from a node to itself is always filled. In network parlance this is a self-loop, and it occurs when a node is connected to itself. In our case, it would mean that someone Rotates the text on the y-axis Figure 6.2 The array of connections we’re building. Notice that every possible connection is stored in the array. Only those connections that exist in our dataset have a weight value other than 0. Notice, also, that our CSV import creates the weight value as a string. 182 CHAPTER 6 Network visualization favorited their own tweet, and fortunately no one in our dataset is a big enough loser to do that. If we want, we can add interactivity to help make the matrix more readable. Grids can be hard to read without something to highlight the row and column of a square. It’s simple to add highlighting to our matrix. All we have to do is add a mouseover event listener that fires a gridOver function to highlight all rectangles that have the same x or y value: d3.selectAll("rect.grid").on("mouseover", gridOver); function gridOver(d,i) { d3.selectAll("rect").style("stroke-width", function (p) { return p.x == d.x || p.y == d.y ? "3px" : "1px"}); }; Now you can see in figure 6.4 how moving your cursor over a grid square highlights the row and column of that grid square. 6.1.3 Arc diagram Another way to graphically represent networks is by using an arc diagram. An arc dia- gram arranges the nodes along a line and draws the links as arcs above and/or below that line. Again, there isn’t a layout available for arc diagrams, and there are even fewer examples, but the principle is rather simple after you see the code. We build Figure 6.3 A weighted, directed adjacency matrix where lighter red indicates weaker connections and darker red indicates stronger connections. The source is on the y-axis, and the target is on the x-axis. The matrix shows that Roy favorited tweets by Sam but Sam didn’t favorite any tweets by Roy. Figure 6.4 Adjacency highlighting column and row of the grid square. In this instance, the mouse is over the Tully-to-Kim edge. You can see that Tully favorited tweets by four people, one of whom was Kim, and that Kim only had tweets favorited by one other person, Pat. 183Static network diagrams another pseudo-layout like we did with the adjacency matrix, but this time we need to process the nodes as well as the links. function arcDiagram() { queue() .defer(d3.csv, "nodelist.csv") .defer(d3.csv, "edgelist.csv") .await(function(error, file1, file2) { createArcDiagram(file1, file2); }); function createArcDiagram(nodes,edges) { var nodeHash = {}; for (x in nodes) { nodeHash[nodes[x].id] = nodes[x]; nodes[x].x = parseInt(x) * 40; }; for (x in edges) { edges[x].weight = parseInt(edges[x].weight); edges[x].source = nodeHash[edges[x].source]; edges[x].target = nodeHash[edges[x].target]; }; linkScale = d3.scale.linear() .domain(d3.extent(edges, function (d) {return d.weight})) .range([5,10]) var arcG = d3.select("svg").append("g").attr("id", "arcG") .attr("transform", "translate(50,250)"); arcG.selectAll("path") .data(edges) .enter() .append("path") .attr("class", "arc") .style("stroke-width", function(d) {return d.weight * 2;}) .style("opacity", .25) .attr("d", arc) arcG.selectAll("circle") .data(nodes) .enter() .append("circle") .attr("class", "node") .attr("r", 10) .attr("cx", function (d) {return d.x;}) function arc(d,i) { var draw = d3.svg.line().interpolate("basis"); var midX = (d.source.x + d.target.x) / 2; var midY = (d.source.x - d.target.x) * 2; return draw([[d.source.x,0],[midX,midY],[d.target.x,0]]) }; }; }; Listing 6.5 Arc diagram code Creates a hash that associates each node JSON object with its ID value Sets each node with an x position based on its array position Replaces the string ID of the node with a pointer to the JSON object Draws the links using the arc function Draws the nodes as circles at each node’s x position Draws a basis-interpolated line from the source node to a computed middle point above them to the target node 184 CHAPTER 6 Network visualization Notice that the edges array that we build uses a hash with the ID value of our edges to create object references. By building objects that have references to the source and target nodes, we can easily calculate the graphical attributes of the or element we’re using to represent the connection. This is the same method used in the force layout that we’ll look at later in the chapter. The result of the code is your first arc diagram, shown in figure 6.5. With abstract charts like these, you’re getting to the point where interactivity is no longer optional. Even though the links follow rules, and you’re not dealing with too many nodes or edges, it can be hard to make out what is connected to what and how. You can add useful interactivity by having the edges highlight the connecting nodes on mouseover. You can also have the nodes highlight connected edges on mouseover by adding two new functions as shown in the following listing, with the results in fig- ure 6.6. d3.selectAll("circle").on("mouseover", nodeOver); d3.selectAll("path").on("mouseover", edgeOver); function nodeOver(d,i) { d3.selectAll("circle").classed("active", function (p) { return p == d ? true : false; }); d3.selectAll("path").classed("active", function (p) { return p.source == d || p.target == d ? true : false; }); }; function edgeOver(d) { d3.selectAll("path").classed("active", function(p) { return p == d ? true : false; }); Listing 6.6 Arc diagram interactivity Figure 6.5 An arc diagram, with connections between nodes represented as arcs above and below the nodes. Arcs above the nodes indicate the connection is from left to right, while arcs below the nodes indicate the source is on the right and the target is on the left. Makes a selection of all nodes to set the class of the node being hovered over to "active" Any edge where the selected node shows up as source or target renders as red 185Force-directed layout d3.selectAll("circle").style("fill",function(p) { return p == d.source ? "blue" : p == d.target ? "green" : "lightgray"; }); }; If you’re interested in exploring arc diagrams further and want to use them for larger datasets, you’ll also want to look into hive plots, which are arc diagrams arranged on spokes. We won’t deal with hive plots in this book, but there’s a plugin layout for hive plots that you can see at https://github.com/d3/d3-plugins/tree/master/hive. Both the adjacency matrix and arc diagram benefit from the control you have over sorting and placing the nodes, as well as the linear manner in which they’re laid out. The next method for network visualization, which is our focus for the rest of the chapter, uses entirely different principles for determining how and where to place nodes and edges. 6.2 Force-directed layout The force layout gets its name from the method by which it determines the most opti- mal graphical representation of a network. Like the word cloud and the Sankey dia- gram from chapter 5, the force() layout dynamically updates the positions of its elements to find the best fit. Unlike those layouts, it does it continuously in real time rather than as a preprocessing step before rendering. The principle behind a force layout is the interplay between three forces, shown in figure 6.7. These forces push nodes away from each other, attract connected nodes to each other, and keep nodes from flying out of sight. In this section, you’ll learn how force-directed layouts work, how to make them, and some general principles from network analysis that will help you better under- stand them. You’ll also learn how to add and remove nodes and edges, as well as adjust the settings of the layout on the fly. This nested if checks to see if a node is the source, which is set to blue, or if it’s the target and set to green, or if it’s neither and set to gray. Figure 6.6 Mouseover behavior on edges (left), with the edge being moused over in pink, the source node in blue, and the target node in green. Mouseover behavior on nodes (right), with the node being moused over in red and the connected edges in pink. 186 CHAPTER 6 Network visualization 6.2.1 Creating a force-directed network diagram The force() layout you see initialized in listing 6.7 has some settings you’ve already seen before. The most obvious is size(), which uses an array containing the width and height of our layout region to calculate the necessary force settings. The nodes() and links() settings are the same as for the Sankey layout in chapter 5. They take, as you’d expect, arrays of data that correspond to the nodes and links. We’re cre- ating our own source and target references in our links array, just like we did with the arc diagram, and that’s the formatting that force() expects. It also accepts integer val- ues where the integer values correspond to the array position of a node in the nodes array, like the formatting of data for the Sankey diagram links array from chapter 5. As you can see in the following listing, the one setting that’s new is charge(), which determines how much each node pushes away other nodes. There’s also a new event listener, "tick", that needs to get associated with a tick function that updates the posi- tion of your nodes and edges. function forceDirected() { queue() .defer(d3.csv, "nodelist.csv") .defer(d3.csv, "edgelist.csv") Listing 6.7 Force layout function Repulsion All nodes push each other away. Sometimes this force is set to be based on an attribute of a node. Larger nodes can be given more space by setting their repulsion higher, or they can act as anchors by setting their repulsion lower. In D3, this is de ned using .charge().fi Canvas Gravity Nodes are pulled toward the layout center to keep the interplay of forces from pushing them out of sight. In D3, this is de ned using.gravity().fi Attraction Nodes that are connected to each other are pulled toward each other. Sometimes, this force is based on the strength of connection, so that more strongly connected nodes are closer. In D3, this is de ned usingfi .linkDistance() and .linkStrength(). Figure 6.7 The forces in a force-directed algorithm: repulsion, gravity, and attraction. Other factors, such as hierarchical packing and community detection, can also be factored into force-directed algorithms, but these features are the most common. Forces are approximated for larger networks to improve performance. 187Force-directed layout .await(function(error, file1, file2) { createForceLayout(file1, file2); }); function createForceLayout(nodes,edges) { var nodeHash = {}; for (x in nodes) { nodeHash[nodes[x].id] = nodes[x]; }; for (x in edges) { edges[x].weight = parseInt(edges[x].weight); edges[x].source = nodeHash[edges[x].source]; edges[x].target = nodeHash[edges[x].target]; }; var weightScale = d3.scale.linear() .domain(d3.extent(edges, function(d) {return d.weight;})) .range([.1,1]); var force = d3.layout.force().charge(-1000) .size([500,500]) .nodes(nodes) .links(edges) .on("tick", forceTick); d3.select("svg").selectAll("line.link") .data(edges, function (d) {return d.source.id + "-" + d.target.id;}) .enter() .append("line") .attr("class", "link") .style("stroke", "black") .style("opacity", .5) .style("stroke-width", function(d) {return d.weight}); var nodeEnter = d3.select("svg").selectAll("g.node") .data(nodes, function (d) {return d.id}) .enter() .append("g") .attr("class", "node"); nodeEnter.append("circle") .attr("r", 5) .style("fill", "lightgray") .style("stroke", "black") .style("stroke-width", "1px"); nodeEnter.append("text") .style("text-anchor", "middle") .attr("y", 15) .text(function(d) {return d.id;}); force.start(); function forceTick() { d3.selectAll("line.link") .attr("x1", function (d) {return d.source.x;}) .attr("x2", function (d) {return d.target.x;}) .attr("y1", function (d) {return d.source.y;}) .attr("y2", function (d) {return d.target.y;}); How much each node pushes away each other; if set to a positive value, nodes attract each other "tick" events are fired continuously, running the associated function. Key values for your nodes and edges will help when we update the network later. Initializing the network starts firing "tick" events and calculates the degree centrality of nodes. The tick function updates the edge- drawing code and node-drawing code based on the newly calculated node positions. 188 CHAPTER 6 Network visualization d3.selectAll("g.node") .attr("transform", function (d) { return "translate("+d.x+","+d.y+")"; }) }; }; }; The animated nature of the force layout is lost on the page, but you can see in figure 6.8 general network structure that’s less prominent in an adjacency matrix or arc dia- gram. It’s readily apparent that four nodes (Mo, Tully, Kim, and Pat) are all connected to each other (forming what in network terms is called a clique), and three nodes (Roy, Pris, and Sam) are more peripheral. Over on the right, two nodes (Lee and Al) are connected only to each other. The only reason those nodes are still onscreen is because the layout’s gravity pulls unconnected pieces toward the center. The thickness of the lines corresponds to the strength of connection. But although we have edge strength, we’ve lost the direction of the edges in this layout. You can tell that the network is directed only because the links are drawn as semitransparent, so you can see when two links of different weights overlap each other. We need to use some method to show if these links are to or from a node. One way to do this is to turn our lines into arrows using SVG markers. 6.2.2 SVG markers Sometimes you want to place a symbol, such as an arrowhead, on a line or path that you’ve drawn. In that case, you have to define a marker in your svg:defs and then associate that marker with the element on which you want it to draw. You can define your marker statically in HTML, or you can create it dynamically like any SVG element, Figure 6.8 A force-directed layout based on our dataset and organized graphically using default settings in the force layout 189Force-directed layout as we’ll do next. The marker we define can be any sort of SVG shape, but we’ll use a path because it lets us draw an arrowhead. A marker can be drawn at the start, end, or middle of a line, and has settings to determine its direction relative to its parent element. var marker = d3.select("svg").append('defs') .append('marker') .attr("id", "Triangle") .attr("refX", 12) .attr("refY", 6) .attr("markerUnits", 'userSpaceOnUse') .attr("markerWidth", 12) .attr("markerHeight", 18) .attr("orient", 'auto') .append('path') .attr("d", 'M 0 0 12 6 0 12 3 6'); d3.selectAll("line").attr("marker-end", "url(#Triangle)"); With the markers defined in listing 6.9, you can now read the network (as shown in figure 6.9) more effectively. You see how the nodes are connected to each other, and you can spot which nodes have reciprocal ties with each other (where nodes are con- nected in both directions). Reciprocation is important to identify, because there’s a big difference between people who favorite Katy Perry’s tweets and people whose tweets are favorited by Katy Perry (the current Twitter user with the most followers). Direction of edges is important, but you can represent direction in other ways, such as using curved edges or edges that grow fatter on one end than the other. To do some- thing like that, you’d need to use a rather than a for the edges like we did with the Sankey layout or the arc diagram. Listing 6.8 Marker definition and application The default setting for markers bases their size off the stroke-width of the parent, which in our case would result in difficult-to-read markers. A marker is assigned to a line by setting the marker-end, marker- start, or marker-mid attribute to point to the marker. Figure 6.9 Edges now display markers (arrowheads) indicating the direction of connection. Notice that all the arrowheads are the same size. 190 CHAPTER 6 Network visualization If you’ve run this code on your own, your network probably looks a little different than what’s shown in figure 6.9. That’s because network visualizations created with force-directed layouts are the result of the interplay of forces, and, even with a small network like this, that interplay can result in different positions for nodes. This can confuse users, who think that these variations indicate different networks. One way around this is to generate a network using a force-directed layout and then fix it in place to create a network basemap. You can then apply any later graphical changes to that fixed network. The concept of a basemap comes from geography, and in network visualization refers to the use of the same layout with differently sized and/or colored nodes and edges. It allows readers to identify regions of the network that are signifi- cantly different according to different measures. You can see this concept of a basemap in use in figure 6.10, which shows how one network can be measured in mul- tiple ways. The force-directed layout provides the added benefit of seeing larger structures. Depending on the size and complexity of your network, they may be enough. But you may need to represent other network measurements when working with network data. 6.2.3 Network measures Networks have been studied for a long time—at least decades and, if you consider graph theory in mathematics, centuries. As a result, you may encounter a few terms and measures when working with networks. This is only meant to be a brief overview. If you want to learn more about networks, I would suggest reading the excellent intro- duction to networks and network analysis by S. Weingart, I. Milligan, and S. Graham at http://www.themacroscope.org/?page_id=337. EDGE WEIGHT You’ll notice that our dataset contains a “weight” value for each link. This represents the strength of the connection between two nodes. In our case, we assume that the more favorites, the stronger a connection that one Twitter user has. We drew thicker lines for a higher weight, but we can also adjust the way the force layout works based on that weight, as you’ll see next. Infoviz term: hairball Network visualizations are impressive, but they can also be so complex that they’re unreadable. For this reason, you’ll encounter critiques of networks that are too dense to be readable. These network visualizations are often referred to as hairballs due to extensive overlap of edges that make them resemble a mass of unruly hair. If you think a force-directed layout is hard to read, you can pair it with another net- work visualization, such as an adjacency matrix, and highlight both as the user nav- igates either visualization. You’ll see techniques for pairing visualizations like this in chapter 11. 191Force-directed layout CENTRALITY Networks are representations of systems, and one of the things you want to know about the nodes in a system is which ones are more important than the others, referred to as centrality. Central nodes are considered to have more power or influence in a network. There are many different measurements of centrality, a few of which are shown in figure 6.10, and different measures more accurately assess centrality in dif- ferent network types. One measure of centrality is computed by D3’s force() layout: degree centrality. Figure 6.10 The same network measured using degree centrality (top left), closeness centrality (top right), eigenvector centrality (bottom left), and betweenness centrality (bottom right). More-central nodes are larger and bright red, whereas less-central nodes are smaller and gray. Notice that although some nodes are central according to all measures, their relative centrality varies, as does the overall centrality of other nodes. 192 CHAPTER 6 Network visualization DEGREE Degree, also known as degree centrality, is the total number of links that are connected to a node. In our example data, Mo has a degree of 6, because he’s the source or tar- get of 6 links. Degree is a rough measure of the importance of a node in a network, because you assume that people or things with more connections have more power or influence in a network. Weighted degree is used to refer to the total value of the con- nections to a node, which would give Mo a value of 18. Further, you can differentiate degree into in degree and out degree, which are used to distinguish between incoming and outgoing links, and which for Mo’s case would be 4 and 2, respectively. Every time you start the force() layout, D3 computes the total number of links per node, and updates that node’s weight attribute to reflect that. We’ll use that to affect the way the force layout runs. For now, let’s add a button that resizes the nodes based on their weight attribute: d3.select("#controls").append("button") .on("click", sizeByDegree).html(“Degree Size"); function sizeByDegree() { force.stop(); d3.selectAll("circle") .attr("r", function(d) {return d.weight * 2;}); }; Figure 6.11 shows the value of the degree centrality measure. Although you can see and easily count the connections and nodes in this small network, being able to spot at a glance the most and least connected nodes is extremely valuable. Notice that we’re counting links in both directions, so that even though Tully is connected to Figure 6.11 Sizing nodes by weight indicates the number of total connections for each node by setting the radius of the circle equal to the weight times 2. 193Force-directed layout more people, he’s the same size as Mo and Pat, who are connected as many times but to fewer people. CLUSTERING AND MODULARITY One of the most important things to find out about a network is whether any commu- nities exist in that network and what they look like. This is done by looking at whether some nodes are more connected to each other than to the rest of the network, known as modularity. You can also look at whether nodes are interconnected, known as cluster- ing. Cliques, mentioned earlier, are part of the same measurement, and clique is a term for a group of nodes that are fully connected to each other. Notice that this interconnectedness and community structure is supposed to arise visually out of a force-directed layout. You see the four highly connected users in a cluster and the other users farther away. If you’d prefer to measure your networks to try to reveal these structures, you can see an implementation of a community detec- tion algorithm implemented by David Mimno with D3 at http://mimno.infosci.cornell .edu/community/. This algorithm runs in the browser and can be integrated with your network quite easily to color your network based on community membership. 6.2.4 Force layout settings When we initialized our force layout, we started out with a charge setting of -1000. Charge and a few other settings give you more control over the way the force layout runs. CHARGE Charge sets the rate at which nodes push each other away. If you don’t set charge, then it has a default setting of -30. The reason we set charge to -1000 was because the default settings for charge with our network would have resulted in a tiny network onscreen (see figure 6.12). Along with setting fixed values for charge, you can use an acces- sor function to base the charge values on an attribute of the node. For instance, you could base the charge on the weight (the degree centrality) of the node so that nodes with many connections push nodes away more, giving them more space on the chart. Negative charge values represent repulsion in a force-directed layout, but you could set them to positive if you wanted your nodes to exert an attractive force. This would likely cause problems with a traditional network visualization but may come in handy for a more complicated visualization. GRAVITY With nodes pushing each other, the only thing to stop them from flying off the edge of your chart is what’s known as canvas gravity, which pulls all nodes toward the center of the layout. When gravity isn’t specifically set, it defaults to .1. Figure 6.13 shows the results of increasing or decreasing the gravity (from our original charge(-1000) setting). Gravity, unlike charge, doesn’t accept an accessor function and only accepts a fixed setting. Figure 6.12 The layout of our network with the default charge, which displays the nodes too closely together to be easily read 194 CHAPTER 6 Network visualization LINKDISTANCE Attraction between nodes is determined by setting the link- Distance property, which is the optimal distance between con- nected nodes. One of the reasons we needed to set our charge so high was because the linkDistance defaults to 20. If we set it to 50, then we can reduce the charge to -100 and produce the results in figure 6.14. Setting your linkDistance parameter too high causes your network to fold back in on itself, which you can identify by the presence of prominent triangles in the network visualiza- tion. Figure 6.15 shows this folding occur with linkDistance set to 200. You can set linkDistance to be a function and associate it with edge weight so that edges with higher or lower weight values have lower or higher distance settings. A bet- ter way to achieve that effect is to use linkStrength. LINKSTRENGTH A force layout is a physical simulation, meaning it uses physical metaphors to arrange the network to its optimal graphical shape. If your network has stronger and weaker Figure 6.13 Increasing the gravity to .2 (left) pulls the two components closer to the center of the layout area. Decreasing the gravity to .05 (right) allows the small component to drift offscreen. Figure 6.14 With linkDistance adjusted, our network becomes much more readable. Figure 6.15 Distortion based on high linkDistance makes it look like Pris is connected to Pat and otherwise clusters nodes together despite their being unrelated. 195Force-directed layout links, like our example does, then it makes sense to have those edges exert stronger and weaker effects on the controlling nodes. You can achieve this by using link- Strength, which can accept a fixed setting but can also take an accessor function to base the strength of an edge on an attribute of that edge: force.linkStrength(function (d) {return weightScale(d.weight);}); Figure 6.16 dramatically demonstrates the results, which reflect the weak nature of some of the con- nections. 6.2.5 Updating the network When you create a network, you want to provide your users with the ability to add or remove nodes to the network, or drag them around. You may also want to adjust the various settings dynamically rather than changing them when you first create the force layout. STOPPING AND RESTARTING THE LAYOUT The force layout is designed to “cool off” and even- tually stop after the network is laid out well enough that the nodes no longer move to new positions. When the layout has stopped like this, you’ll need to restart it if you want it to animate again. Also, if you’ve made any changes to the force settings or want to add or remove parts of the network, then you’ll need to stop it and restart it. FORCE.STOP() You can turn off the force interaction by using force.stop(), which stops running the simulation. It’s good to stop the network when there’s an interaction with a com- ponent elsewhere on your web page or some change in the styling of the network. FORCE.START() To begin or restart the animation of the layout, use force.start(). You’ve already seen .start(), because we used it in our initial example to get the force layout going. FORCE.RESUME() If you haven’t made any changes to the nodes or links in your network and you want the network to start moving again, you can use force.resume(). It resets a cooling parameter, which causes the force layout to start moving again. FORCE.TICK() Finally, if you want to move the layout forward one step, you can use force.tick(). Force layouts can be resource-intensive, and you may want to use one for just a few sec- onds rather than let it run continuously. Figure 6.16 By basing the strength of the attraction between nodes on the strength of the connections between nodes, you see a dramatic change in the structure of the network. The weaker connections between x and y allow that part of the network to drift away. 196 CHAPTER 6 Network visualization FORCE.DRAG() With traditional network analysis programs, the user can drag nodes to new positions. This is implemented using the behavior force.drag(). A behavior is like a compo- nent in that it’s called by an element using .call(), but instead of creating SVG ele- ments, it creates a set of event listeners. In the case of force.drag(), those event listeners correspond to dragging events that give you the ability to click and drag your nodes around while the force layout runs. You can enable dragging on all your nodes by selecting them and calling force.drag() on that selection: d3.selectAll("g.node").call(force.drag()); FIXED When a force layout is associated with nodes, each node has a boolean attribute called fixed that determines whether the node is affected by the force during ticks. One effective interaction technique is to set a node as fixed when the user interacts with it. This allows users to drag nodes to a position on the canvas so they can visually sort the important nodes. To differentiate fixed nodes from unfixed nodes, we’ll also have the function give fixed nodes a thicker "stroke-width". The effect of dragging some of our nodes is shown in figure 6.17. d3.selectAll("g.site").on("click", fixNode); function fixNode(d) { d3.select(this).select("circle").style("stroke-width", 4); d.fixed = true; }; Figure 6.17 The node representing Pat has been dragged to the bottom-left corner and fixed in position, while the node representing Pris has been dragged to the top-left corner and fixed in position. The remaining unfixed nodes have taken their positions based on the force-directed layout. 197Force-directed layout 6.2.6 Removing and adding nodes and links When dealing with networks, you may want to filter the networks or give the user the ability to add or remove nodes. To filter a network, you need to stop() it, remove any nodes and links that are no longer part of the network, rebind those arrays to the force layout, and then start() the layout. This can be done as a filter on the array that makes up your nodes. For instance, we may want to only see the network of people with more than 20 followers, because we want to see how the most influential people are connected. But that’s not enough, because we would still have links in our layout that refer- ence nodes that no longer exist. We’ll need a more involved filter for our links array. By using the .indexOf function of an array, though, we can easily create our filtered links by checking to see if the source and target are both in our filtered nodes array. Because we used key values when we first bound our arrays to our selection in listing 6.8, we can use the selection.exit() behavior to easily update our network. You can see how to do this in the following listing and the effects in figure 6.18. function filterNetwork() { force.stop(); var originalNodes = force.nodes(); var originalLinks = force.links(); var influentialNodes = originalNodes.filter(function (d) { return d.followers > 20; }); Listing 6.9 Filtering a network Figure 6.18 The network has been filtered to only show nodes with more than 20 followers, after clicking the Degree Size button. Notice that Lee, with no connections, has a degree of 0 and so the associated circle has a radius of 0, rendering it invisible. This catches two processes in midstream, the transition of nodes from full to 0 opacity and the removal of edges. Accesses the current array of nodes and array of links associated with the force layout 198 CHAPTER 6 Network visualization var influentialLinks = originalLinks.filter(function (d) { return influentialNodes.indexOf(d.source) > -1 && influentialNodes.indexOf(d.target) > -1; }); d3.selectAll("g.node") .data(influentialNodes, function (d) {return d.id}) .exit() .transition() .duration(4000) .style("opacity", 0) .remove(); d3.selectAll("line.link") .data(influentialLinks, function (d) { return d.source.id + "-" + d.target.id; }) .exit() .transition() .duration(3000) .style("opacity", 0) .remove(); force .nodes(influentialNodes) .links(influentialLinks); force.start(); }; Because the force algorithm is restarted after the filtering, you can see how the shape of the network changes with the removal of so many nodes. That animation is impor- tant because it reveals structural changes in the network. Putting more nodes and edges into the network is easy, as long as you properly for- mat your data. You stop the force layout, add the properly formatted nodes or edges to the respective arrays, and rebind the data as you’ve done in the past. If, for instance, we want to add an edge between Sam and Al as shown in figure 6.19, we need to stop the force layout like we did earlier, create a new datapoint for that edge, and add it to the array we’re using for the links. Then we rebind the data and append a new line element for that edge before we restart the force layout. function addEdge() { force.stop(); var oldEdges = force.links(); var nodes = force.nodes(); newEdge = {source: nodes[0], target: nodes[8], weight: 5}; oldEdges.push(newEdge); force.links(oldEdges); d3.select("svg").selectAll("line.link") .data(oldEdges, function(d) { return d.source.id + "-" + d.target.id; }) Listing 6.10 A function for adding edges Makes an array of links only out of those that reference existing nodes By setting a transition on the .exit(), it applies the transition only to those nodes being removed and waits until the transition is finished to remove them 199Force-directed layout .enter() .insert("line", "g.node") .attr("class", "link") .style("stroke", "red") .style("stroke-width", 5) .attr("marker-end", "url(#Triangle)"); force.start(); }; If we want to add new nodes as shown in figure 6.20, we’ll also want to add edges at the same time, not because we have to, but because otherwise they’ll float around in space and won’t be connected to our current network. The code and process, which you can see in the following listing, should look familiar to you by now. function addNodesAndEdges() { force.stop(); var oldEdges = force.links(); var oldNodes = force.nodes(); var newNode1 = {id: "raj", followers: 100, following: 67}; var newNode2 = {id: "wu", followers: 50, following: 33}; var newEdge1 = {source: oldNodes[0], target: newNode1, weight: 5}; var newEdge2 = {source: oldNodes[0], target: newNode2, weight: 5}; oldEdges.push(newEdge1,newEdge2); oldNodes.push(newNode1,newNode2); force.links(oldEdges).nodes(oldNodes); Listing 6.11 Function for adding nodes and edges Figure 6.19 Network with a new edge added. Notice that because we re-initialized the force layout, it correctly recalculated the weight for Al. 200 CHAPTER 6 Network visualization d3.select("svg").selectAll("line.link") .data(oldEdges, function(d) { return d.source.id + "-" + d.target.id }) .enter() .insert("line", "g.node") .attr("class", "link") .style("stroke", "red") .style("stroke-width", 5) .attr("marker-end", "url(#Triangle)"); var nodeEnter = d3.select("svg").selectAll("g.node") .data(oldNodes, function (d) { return d.id }).enter() .append("g") .attr("class", "node") .call(force.drag()); nodeEnter.append("circle") .attr("r", 5) .style("fill", "red") .style("stroke", "darkred") .style("stroke-width", "2px"); nodeEnter.append("text") .style("text-anchor", "middle") .attr("y", 15) .text(function(d) {return d.id;}); force.start(); }; Figure 6.20 Network with two new nodes added (Raj and Wu), both with links to Sam 201Force-directed layout 6.2.7 Manually positioning nodes The force-directed layout doesn’t move your elements. Instead, it calculates the posi- tion of elements based on the x and y attributes of those elements in relation to each other. During each tick, it updates those x and y attributes. The tick function selects the and elements and moves them to these updated x and y values. When you want to move your elements manually, you can do so like you normally would. But first you need to stop the force so that you prevent that tick function from overwriting your elements’ positions. Let’s lay out our nodes like a scatterplot, looking at the number of followers by the number that each node is following. We’ll also add axes to make it readable. You can see the code in the following listing and the results in figure 6.21. function manuallyPositionNodes() { var xExtent = d3.extent(force.nodes(), function(d) { return parseInt(d.followers) }); var yExtent = d3.extent(force.nodes(), function(d) { return parseInt(d.following) }); var xScale = d3.scale.linear().domain(xExtent).range([50,450]); var yScale = d3.scale.linear().domain(yExtent).range([450,50]); force.stop(); d3.selectAll("g.node") .transition() Listing 6.12 Moving our nodes manually Figure 6.21 When the network is represented as a scatterplot, the links increase the visual clutter. It provides a useful contrast to the force-directed layout, but can be hard to read on its own. 202 CHAPTER 6 Network visualization .duration(1000) .attr("transform", function(d) { return "translate("+ xScale(d.followers) +","+yScale(d.following) +")"; }); d3.selectAll("line.link") .transition() .duration(1000) .attr("x1", function(d) {return xScale(d.source.followers);}) .attr("y1", function(d) {return yScale(d.source.following);}) .attr("x2", function(d) {return xScale(d.target.followers);}) .attr("y2", function(d) {return yScale(d.target.following);}); var xAxis = d3.svg.axis().scale(xScale).orient("bottom").tickSize(4); var yAxis = d3.svg.axis().scale(yScale).orient("right").tickSize(4); d3.select("svg").append("g").attr("transform", "translate(0,460)").call(xAxis); d3.select("svg").append("g").attr("transform", "translate(460,0)").call(yAxis); d3.selectAll("g.node").each(function(d){ d.x = xScale(d.followers); d.px = xScale(d.followers); d.y = yScale(d.following); d.py = yScale(d.following); }); }; Notice that you need to update the x and y attributes of each node, but you also need to update the px and py attributes of each node. The px and py attributes are the pre- vious x and y coordinates of the node before the last tick. If you don’t update them, then the force layout thinks that the nodes have high velocity, and will violently move them from their new position. If you didn’t update the x, y, px, and py attributes, then the next time you started the force layout, the nodes would immediately return to their positions before you moved them. This way, when you restart the force layout with force.start(), the nodes and edges animate from their current position. 6.2.8 Optimization The force layout is extremely resource-intensive. That’s why it cools off and stops run- ning by design. And if you have a large network running with the force layout, you can tax a user’s computer until it becomes practically unusable. The first tip to optimiza- tion, then, is to limit the number of nodes in your network, as well as the number of edges. A general rule is no more than 100 nodes, unless you know your audience is going to be using the browsers that perform best with SVG, like Safari and Chrome. But if you have to present more nodes and want to reduce the performance press, you can use force.chargeDistance() to set a maximum distance when computing the repulsive charge for each node. The lower this setting, the less structured the 203Summary force layout will be, but the faster it will run. Because networks vary so much, you’ll have to experiment with different values for chargeDistance to find the best one for your network. 6.3 Summary In this chapter you learned several methods for displaying network data, and looked in-depth at the force layouts available for network data in D3. There’s no one way to visually represent a network. Now you have multiple methods, and static, dynamic, and interactive variations, with which to work. Specifically, we covered ■ Formatting a node and edge list in the manner D3 typically uses ■ Building a weighted, directed adjacency matrix and adding interaction to explore it ■ Building an interactive weighted, directed arc diagram ■ Applying simple techniques to find links to a node ■ Building and customizing force-directed layouts ■ The basics of network terminology and statistics, such as edge, node, degree, and centrality ■ Using accessors to create dynamic forces ■ Adding interactivity to update node size based on degree centrality We focused on network information visualization because our world is awash in net- work data. In the next chapter, we’ll look at another broadly applicable but specific domain: geographic information visualization. Just as you’ve seen several different ways to represent networks in this chapter, in chapter 7 you’ll learn different ways of making maps, including tiled maps, globes, and traditional data-driven polygon maps. 204 Geospatial information visualization One of the most common categories of data you’ll encounter is geospatial data. This can come in the form of administrative regions like states or counties, points that represent cities or the location of a person when making a tweet, or satellite imagery of the surface of the earth. In the past, if you wanted to make a web map you needed a specialized library like Google Maps, Leaflet, or OpenLayers. But D3 provides enough core functional- ity to make any kind of map you’ve seen on the web (some examples of maps cre- ated in this chapter using D3 can be seen in figure 7.1). Because you’re already working with D3, you can make that map far more sophisticated and distinctive than the out-of-the-box maps you typically see. The major reason to continue to use This chapter covers ■ Creating points and polygons from GeoJSON and TopoJSON data ■ Using Mercator, Mollweide, orthographic, and satellite projections ■ Advanced TopoJSON neighbor and merging functionality ■ Tiled mapping using d3.geo.tile 205Geospatial information visualization a dedicated library like Google Maps API is because of the added functionality that comes from being in that ecosystem, such as Street View of Google tiles or inte- grated support for Fusion Tables. But if you’re not going to use the ecosystem, then it may be a smarter move to build the map with D3. You won’t have to invest in learn- ing a different syntax and abstraction layer, and you’ll have the greater flexibility D3 mapping affords. Because mapmaking and geographic information systems and science (known as GIS and GIScience, respectively) have been in practice for so long, well-developed methods exist for representing this kind of data. D3 has built-in robust functionality to load and display geospatial data. A related library that you’ll get to know in this chap- ter, TopoJSON, provides more functionality for geospatial information visualization. In this chapter, we’ll start by making maps that combine points, lines, and poly- gons using data from CSV and GeoJSON formatted sources. You’ll learn how to style those maps and provide interactive zooming by revisiting d3.zoom() and exploring it in more detail. After that, we’ll look at the TopoJSON data format and its built-in func- tionality that uses topology, and why it provides significantly smaller data files. Finally, you’ll learn how to make maps using tiles to show terrain and satellite imagery. Figure 7.1 Mapping with D3 takes many forms and offers many options, including traditional tile-based maps (section 7.5), cutting-edge TopoJSON operations (section 7.4), globes (section 7.3.1), spatial calculations (section 7.1.4), and data-driven maps (section 7.1) using novel projections (section 7.1.3). 206 CHAPTER 7 Geospatial information visualization 7.1 Basic mapmaking Before you explore the boundaries of mapping possibilities, you need to make a sim- ple map. In D3, the simplest map you can make is a vector map using SVG and elements to represent countries and cities. We can bring back cities.csv, which we used in chapter 2, and finally take advantage of its coordinates, but we need to look a bit further to find the data necessary to represent those countries. After we have that data, we can render it as areas, lines, or points on a map. Then we can add interactivity, such as highlighting a region when you move your mouse over it, or com- puting and showing its center. Before we get started, though, let’s take a look at the CSS for this chapter. path.countries { stroke-width: 1; stroke: black; opacity: .5; fill: red; } circle.cities { stroke-width: 1; stroke: black; fill: white; } circle.centroid { fill: red; pointer-events: none; } rect.bbox { fill: none; stroke-dasharray: 5 5; stroke: black; stroke-width: 2; pointer-events: none; } path.graticule { fill: none; stroke-width: 1; stroke: black; } path.graticule.outline { stroke: black; } 7.1.1 Finding data Making a map requires data, and you have an enormous amount of data available. Geographic data can come in several forms. If you’re familiar with GIS, then you’ll be familiar with one of the most common forms for complex geodata, the shapefile, which Listing 7.1 ch7.css 207Basic mapmaking is a format developed by Esri and is most commonly found in desktop GIS applica- tions. But the most human-readable form of geodata is latitude and longitude (or xy coordinates like we list in our file) when dealing with points like cities, oftentimes in a CSV. We’ll use cities.csv, shown in the following listing. This is the same CSV we mea- sured in chapter 2 that had the locations of eight cities from around the world. "label","population","country","x","y" "San Francisco", 750000,"USA",-122,37 "Fresno", 500000,"USA",-119,36 "Lahore",12500000,"Pakistan",74,31 "Karachi",13000000,"Pakistan",67,24 "Rome",2500000,"Italy",12,41 "Naples",1000000,"Italy",14,40 "Rio",12300000,"Brazil",-43,-22 "Sao Paolo",12300000,"Brazil",-46,-23 One thing you’ll notice is that the latitudes and longitudes are imprecise. San Fran- cisco, for instance, isn’t at 37,-122 but rather 37.783, -122.417. When you plot these cit- ies, they’re going to look pretty off as you zoom in. Obviously, you’ll want to use more accurate coordinates for your maps, but for this example, which mostly uses maps that are zoomed way out, this should be fine. If you only have city names or addresses and need to get latitude and longitude, you can take advantage of geocoding services that provide latitude and longitude from addresses. These exist as APIs and are available on the web for small batches. You can see an example of these services maintained by Texas A&M at http://geoservices .tamu.edu/Services/Geocode/. When dealing with more complex geodata like shapes or lines, you’ll necessarily deal with more complex data formats. You’ll want to use GeoJSON, which has become the standard for web-mapping data. GEOJSON GeoJSON (geojson.org) is, like it sounds, a way of encoding geodata in JSON format. Each feature in a featureCollection is a JSON object that stores the border of the feature in a coordinates array as well as metadata about the feature in a properties hash object. For instance, if you wanted to draw a square that went around the island of Manhattan, then it would have corners at [-74.0479, 40.6829], [-74.0479, 40.8820], [-73.9067, 40.8820], and [-73.9067, 40.6829], as shown in figure 7.2. You can easily export shape- files into GeoJSON using QGIS (a desktop GIS application; qgis.org), PostGIS (a spatial database run on Postgres; postgis.net), GDAL (a library for manipulation of geospatial data; gdal.org), and other tools and libraries. A rectangle drawn over a geographic feature like this is known as a bounding box. It’s often represented with only two coordinate pairs: the upper-left and bottom-right corners. But any polygon data, such as the irregular border of a state or coastline, can be represented by an array of coordinates like this. In the following listing, we have a Listing 7.2 cities.csv 208 CHAPTER 7 Geospatial information visualization fully compliant GeoJSON "FeatureCollection" with only one feature, the simplified borders of the small nation of Luxembourg. { "type": "FeatureCollection", "features": [ { "type": "Feature", "id": "LUX", "properties": { "name": "Luxembourg" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.043073, 50.128052 ], [ 6.242751, 49.902226 ], [ 6.18632, 49.463803 Listing 7.3 GeoJSON example of Luxembourg Figure 7.2 A polygon drawn at the coordinates [-74.0479, 40.8820], [-73.9067, 40.8820], [-73.9067, 40.6829], and [-74.0479, 40.6829]. 209Basic mapmaking ], [ 5.897759, 49.442667 ], [ 5.674052, 49.529484 ], [ 5.782417, 50.090328 ], [ 6.043073, 50.128052 ] ] ] } } ] } We’re not going to create our own GeoJSON in this chapter, and unless you get into serious GIS, you may never create your own GeoJSON. Instead, you can get by with downloading existing geodata, and either use it without editing it or edit it in a GIS application and export it. In our examples in this chapter, we’ll use world.geojson (available at emeeks.github.io/d3ia/world.geojson), a file that consists of the coun- tries of the world in the same simplified, low-resolution representation that you see in listing 7.4. PROJECTION Entire books have been written on creating web maps, and an entire book could be written on using D3.js for crafting maps. Because this is only one chapter, I’ll gloss over many deep issues. One of these is projection. In GIS, projection refers to the process of rendering points on a globe, like the earth, onto a flat plane, like your computer mon- itor. You can project geographic data in many different ways for representation on your screen, and in this chapter we’ll look at a few different methods. To start, we’ll use one of the most common geographic projections, the Mercator projection, which is used in most web maps. It became the de facto standard because it’s the projection used by Google Maps. To use the Mercator projection, you have to include an extension of D3, d3.geo.projection.js, which you’ll want for some of the more interesting work you’ll do later in the chapter. By defining a projection, you can take advantage of d3.geo.path, which draws geoData onscreen based on your selected projection. After we’ve defined a projection and have geo.path() ready, the entire code in the following listing is all that we need to draw the map shown in figure 7.3. 210 CHAPTER 7 Geospatial information visualization d3.json("world.geojson", createMap); function createMap(countries) { var aProjection = d3.geo.mercator(); var geoPath = d3.geo.path().projection(aProjection); d3.select("svg").selectAll("path").data(countries.features) .enter() .append("path") .attr("d", geoPath) .attr("class", "countries"); }; Why do you only see part of the world in figure 7.3? Because the default settings of the Mercator projection show only part of the world in your SVG canvas. Each projection has a .translate() and .scale() that follow the syntax of the transform convention in SVG, but have different effects with different projections. SCALE You have to do some tricks to set the right scale for certain projects. For instance, with our Mercator projection if we divide the width of the available space by 2 and divide the quotient by Math.pi, then the result will be the proper scale to display the entire world in the available space. Figuring out the right scale for your map and your pro- jection is typically done through experimenting with different values, but it’s easier when you include zooming, as you’ll see in section 7.2.2. Listing 7.4 Initial mapping function Figure 7.3 A map of the world using the default settings for D3’s Mercator projection. You can see most of the Western Hemisphere and some of Europe and Africa, but the rest of the world is rendered out of sight. Projection functions have many options that you’ll see later. d3.geo.path() defaults to albersUSA, which is a projection suitable only for maps of the United States.d3.geo.path() takes properly formatted GeoJSON features and returns SVG drawing code for SVG paths. 211Basic mapmaking Different families of projections have different scale defaults. The d3.geo.albers- Usa projection defaults to 1070, while d3.geo.mercator defaults to 150. As with most D3 functions like this, you can see the default by calling the function without passing it a value: d3.geo.mercator().scale() d3.geo.albersUsa().scale() By adjusting the translate and scale as in listing 7.5, we can adjust the projection to show different parts of the geodata we’re working with—in our case, the world. The result in figure 7.4 shows that we now see the entire world rendered. function createMap(countries) { var width = 500; var height = 500; var aProjection = d3.geo.mercator() .scale(80) .translate([width / 2, height / 2]); var geoPath = d3.geo.path().projection(aProjection); d3.select("svg").selectAll("path").data(countries.features) .enter() .append("path") .attr("d", geoPath) .attr("class", "countries"); }; Listing 7.5 Simple map with scale and translate settings 150 1070 Figure 7.4 The Mercator-projected world from our data now fitting our SVG area. Notice the enormous distortion in size of regions near the poles, such as Greenland and Antarctica. By defining the size of our SVG as variables, we can refer to them throughout our visualization code. Scale values are different for different families of projections; 80 works well in this case. Moves the center of the projection to the center of our canvas 212 CHAPTER 7 Geospatial information visualization 7.1.2 Drawing points on a map Projection isn’t used only to display areas; it’s also used to place individual points. Typ- ically, you think of cities or people as represented not by their spatial footprint (though you do this with particularly large cities) but with a single point on a map, which is sized based on some variable such as population. A D3 projection can be used not only in a geo.path() but also as a function on its own. When you pass it an array with a pair of latitude and longitude coordinates, it returns the screen coordinates necessary to place that point. For instance, if we want to know where to place a point representing San Francisco (roughly speaking, -122 latitude, 37 longitude), then we could simply pass those values to our projection: aProjection([-122,37]) We can use this to add cities to our map along with loading the data from cities.csv, as in the following listing and which you see in figure 7.5. queue() .defer(d3.json, "world.geojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); Listing 7.6 Loading point and polygon geodata [79.65586500535346, 194.32096033997914] Figure 7.5 Our map with our eight world cities added to it. At this distance, you can’t tell how inaccurate these points are, but if you zoom in, you see that both of our Italian cities are actually in the Mediterranean. 213Basic mapmaking function createMap(countries, cities) { var width = 500; var height = 500; var projection = d3.geo.mercator() .scale(80) .translate([width / 2, height / 2]); var geoPath = d3.geo.path().projection(projection); d3.select("svg").selectAll("path").data(countries.features) .enter() .append("path") .attr("d", geoPath) .style("fill", "gray"); d3.select("svg").selectAll("circle").data(cities) .enter() .append("circle") .style("fill", "red") .attr("class", "cities") .attr("r", 3) .attr("cx", function(d) {return projection([d.x,d.y])[0]}) .attr("cy", function(d) {return projection([d.x,d.y])[1]}); }; One thing to note from listing 7.6 is that coordinates are often given in the real world in the order of “latitude, longitude.” Because latitude corresponds to the y-axis and longitude corresponds to the x-axis, you have to flip them to provide the x, y coordi- nates necessary for GeoJSON and D3. 7.1.3 Projections and areas Depending on what projection you use, the graphical size of your geographic objects will appear different. This is because it’s impossible to perfectly display spherical coor- dinates on a flat surface. Different projections are designed to visually display the geo- graphic area of land or ocean regions, or the measurable distance, or particular shapes. Because we included d3.geo.projection.js, we have access to quite a few more projections to play with, one of which is the Mollweide projection. In the code in listing 7.7, you can see the settings necessary to properly display a Mollweide projec- tion of our geodata. We’ll use the calculated area of the countries (the graphical area, not their actual physical area) to color each country. The results are quite distinct from the same code running on our Mercator projection, as shown in figure 7.6. The world as displayed with Mollweide curves the edges, rather than stretching them into a rectangle like Mercator does. queue() .defer(d3.json, "world.geojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); Listing 7.7 Mollweide projected world Overrides the fill style so it’ll be easier to see your cities You want to draw the cities over the countries, so you append them second. Projection returns an array, which means you need to take the [0] value for cx and the [1] value for cy 214 CHAPTER 7 Geospatial information visualization function createMap(countries, cities) { var width = 500; var height = 500; var projection = d3.geo.mollweide() .scale(120) .translate([width / 2, height / 2]); var geoPath = d3.geo.path().projection(projection); var featureSize = d3.extent(countries.features, function(d) {return geoPath.area(d);}); var countryColor = d3.scale.quantize() .domain(featureSize).range(colorbrewer.Reds[7]); d3.select("svg").selectAll("path").data(countries.features) .enter() .append("path") .attr("d", geoPath) .attr("class", "countries") .style("fill", function(d) { return countryColor(geoPath.area(d)) }); d3.select("svg").selectAll("circle").data(cities) .enter() .append("circle") .attr("class", "cities") .attr("r", 3) .attr("cx", function(d) {return projection([d.x,d.y])[0]}) .attr("cy", function(d) {return projection([d.x,d.y])[1]}); }; For a Mollweide projection; shows the entire world Measures the features and assigns the size classes to a color ramp Colors each country based on its size Figure 7.6 Mercator (left) dramatically distorts the size of Antarctica so much that no other shape looks as large. In comparison, the Mollweide projection maintains the actual physical area of the countries and continents in your geodata, at the cost of distorting their shape and angle. Notice that geo.path.area measures the graphical area and not the actual physical area of the features. 215Basic mapmaking Picking the right projection is never easy, and depends on the goals of the map you’re making. If you’re working with traditional tile mapping, then you’ll probably stick with Mercator. If you’re working on the world scale, it’s usually best to use an equal- area projection like Mollweide that doesn’t distort the visual area of geographic features. But because D3 has so many different projections available, you should experiment to see which best suits the particular map you’re creating. 7.1.4 Interactivity Much of the geospatial data-related code in D3 comes with built-in functionality that you’ll typically need when working with geodata. In addition to determining the area like we did to color our features, D3 has other useful functions. Two that are com- monly used in mapping are the ability to quickly calculate the center of a geographic area (known as a centroid) and its bounding box, like you see in figure 7.7. In the fol- lowing listing, you can see how to add mouseover events to the paths we created and draw a circle at the center of each geographic area, as well as a bounding box around it. d3.selectAll("path.countries") .on("mouseover", centerBounds) .on("mouseout", clearCenterBounds); function centerBounds(d,i) { var thisBounds = geoPath.bounds(d); var thisCenter = geoPath.centroid(d); d3.select("svg") .append("rect") .attr("class", "bbox") .attr("x", thisBounds[0][0]) .attr("y", thisBounds[0][1]) .attr("width", thisBounds[1][0] - thisBounds[0][0]) Infoviz term: choropleth map As you encounter more mapmaking, you’ll hear the term choropleth map used to refer to a map that encodes data using the color of a region. You can use the existing geo- graphic features, in this case countries, to display statistical data, such as the GDP of a country, its population, or its most widely used language. You can do this in D3 either by getting geodata where the properties field has that information or by link- ing a table of data to your geodata where they both have the same unique ID values in common. Keep in mind that choropleth maps, although useful, are subject to what’s known as the areal unit problem, which is what happens when you draw boundaries or select existing features in such a way that they disproportionately represent your statistics. This is the case with gerrymandering, when political districts are drawn in such a way as to create majorities for one political party or another. Listing 7.8 Rendering bounding boxes with geodata Functions of geo.path that give results based on the associated projection Bounding box is the top- left and bottom-right coordinates as an array 216 CHAPTER 7 Geospatial information visualization .attr("height", thisBounds[1][1] - thisBounds[0][1]) .style("fill", "none") .style("stroke-dasharray", "5 5") .style("stroke", "black") .style("stroke-width", 2) .style("pointer-events", "none"); d3.select("svg") .append("circle") .attr("class", "centroid") .style("fill", "red") .attr("r", 5) .attr("cx", thisCenter[0]).attr("cy", thisCenter[1]) .style("pointer-events", "none"); }; function clearCenterBounds() { d3.selectAll("circle.centroid").remove(); d3.selectAll("rect.bbox").remove(); }; You’ve learned the core geo functions that allow you to make maps with D3: geo .projection and geo.path. By using these functions, you can create maps with a dis- tinct look and feel, and provide your users with the ability to interact with them as shapes and as geographic features. D3 provides more functionality, and we’ll dive into it now. 7.2 Better mapping To make your maps more readable, you can use built-in features from d3.geo: the graticule generator and the zoom behavior. One provides grid lines that make it easier Centroid is an array with the x and y coordinates of the center of a feature Removes the shapes when you mouse off a feature Figure 7.7 Your interactivity provides a bounding box around each country and a red circle representing its graphical center. Here you see the bounding box and centroid of China. The D3 implementation of a centroid is weighted, so that it’s the center of most area, and not just the center of the bounding box. 217Better mapping to read a map, and the other allows you to pan and zoom around your map. Both of these follow the same format and functionality of other behaviors and generators in D3, but are particularly useful for maps. 7.2.1 Graticule A graticule is a grid line on a map. Just as D3 has generators for lines, areas, and arcs, it has a generator for graticules to make your maps more beautiful. The graticule gener- ator creates gridlines (you can specify where and how many, or use the default) and also creates an outline that can provide a useful border. Listing 7.9 shows how to draw a graticule beneath the countries we’ve already drawn. Instead of .data we use .datum, which is a convenience function that allows us to bind a single datapoint to a selection so it doesn’t need to be in an array. In other words, .datum(yourDatapoint) is the same as .data([yourDatapoint]). var graticule = d3.geo.graticule(); d3.select("svg").append("path") .datum(graticule) .attr("class", "graticule line") .attr("d", geoPath) .style("fill", "none") .style("stroke", "lightgray") .style("stroke-width", "1px"); d3.select("svg").append("path") .datum(graticule.outline) .attr("class", "graticule outline") .attr("d", geoPath) .style("fill", "none") .style("stroke", "black") .style("stroke-width", "1px"); But how are we drawing so many graticule lines in figure 7.8 from a single datapoint? The geo.graticule function creates a feature known as a multilinestring. A multiline- string, as you may have figured out, is an array of arrays of coordinates, each corre- sponding to separate individual components of a feature. Multilinestrings and their counterparts, multipolygons, have always been a part of GIS because countries like the United States or Indonesia are made up of disconnected features such as states and regions, and that information needed to be stored in the data. As a result, when d3.geo.path gets a multipolygon or multilinestring, it draws a element made up of multiple, disconnected pieces. 7.2.2 Zoom You dealt with zoom a little bit in chapter 5, when you saw how the zoom behavior can easily allow you to pan a chart around the screen. Now it’s time you start zooming with zoom. When we first looked at the zoom behavior, we used it to adjust the transform Listing 7.9 Adding a graticule 218 CHAPTER 7 Geospatial information visualization attribute of a element that held our chart. This time, we’ll use the scale and trans- late values of the zoom behavior to update the settings of our projection, which will give us the ability to zoom and pan our map. Create a zoom behavior and call it from the element. Whenever you have a drag event on anything in the , a mousewheel event, or a double-click, then it triggers zoom. When we worked with zoom before, we only dealt with the dragging, which updates the zoom.translate() value and which you can use to update the translate value of whatever element you want to update. This time, we’ll also use the zoom.scale() value, which gives us an increasing (when you double-click or roll your mousewheel forward) or decreasing (when you roll your mousewheel backward) value. To use zoom with a projection, we’ll want to overwrite the initial zoom.scale() value with the scale value of the projection, and do the same with the zoom translate value. After that, any time we have an event that triggers zoom, we’ll use the new val- ues to update our projection, as shown in the following listing and in figure 7.9. var mapZoom = d3.behavior.zoom() .translate(projection.translate()) .scale(projection.scale()) .on("zoom", zoomed); d3.select("svg").call(mapZoom); function zoomed() { projection.translate(mapZoom.translate()).scale(mapZoom.scale()); Listing 7.10 Zoom and pan with maps Figure 7.8 Our map with a graticule (in light gray) and a graticule outline (the black border around the edge of the map) Overwrites the translate and scale of the zoom to match the projection Whenever the zoom behavior is called, overwrites the projection to match the updated zoom values 219Better mapping d3.selectAll("path.graticule").attr("d", geoPath); d3.selectAll("path.countries").attr("d", geoPath); d3.selectAll("circle.cities") .attr("cx", function(d) {return projection([d.x,d.y])[0]}) .attr("cy", function(d) {return projection([d.x,d.y])[1]}); }; The zoom behavior updates its .translate() array in reference to your dragging behav- ior, and increases or decreases the .scale() value in reference to your mousewheel and double-click behavior. Because it’s designed to work with SVG transform and D3 geo- graphic projections, d3.behavior.zoom is all you need for pan-and-zoom functionality. Infoviz term: semantic zoom When you think about zooming in on things, you naturally think about increasing their size. But from working with mapping, you know that you don’t just increase the size or resolution as you zoom in; you also change the kind of data that you present to the reader. This is known as semantic zoom in contrast to graphical zoom. It’s most clear when you look at a zoomed-out map and see only country boundaries and a few major cities, but as you zoom in you see roads, smaller cities, parks, and so on. You should try to use semantic zoom whenever you’re letting your user zoom in and out of any data visualization, not just a chart. It allows you to present strategic or global information when zoomed out, and high-resolution data when zoomed in. Any path will be properly redrawn by calling the d3.geo.path associated with the updated projection. Also calls the now-updated projection Figure 7.9 Our map with zooming enabled. Panning occurs with the drag behavior and zooming with mousewheel and/or double-clicking. Notice that the bounding box and centroid functions still work, because they’re based on our constantly updating projection. 220 CHAPTER 7 Geospatial information visualization The default zoom behavior assumes a user knows that the mousewheel and double- clicking are associated with zooming. But sometimes you want zoom buttons, because you can’t assume the user knows that interaction or because you want to constrain or control the zooming process in a more complicated manner. The code in the following listing creates a zoom function and adds the necessary buttons, as seen in figure 7.10. function zoomButton(zoomDirection) { if (zoomDirection == "in") { var newZoom = mapZoom.scale() * 1.5; var newX = ((mapZoom.translate()[0] - (width / 2)) * 1.5) + width / 2; var newY = ((mapZoom.translate()[1] - (height / 2)) * 1.5) + height / 2; } else if (zoomDirection == "out") { var newZoom = mapZoom.scale() * .75; var newX = ((mapZoom.translate()[0] - (width / 2)) * .75) + width / 2; var newY = ((mapZoom.translate()[1] - (height / 2)) * .75) + height / 2; } mapZoom.scale(newZoom).translate([newX,newY]) zoomed(); } d3.select("#controls").append("button").on("click", function (){ zoomButton("in")}).html("Zoom In"); Listing 7.11 Manual zoom controls for maps Figure 7.10 Zoom buttons and the effect of pressing Zoom Out five times. Because the zoom buttons modify the zoom behavior’s translate and scale, any mouse interaction afterward reflects the updated settings. Calculating the new scale is easy. Calculating the new translate settings isn’t so easy and requires that you recalculate the center. Sets the zoom behavior’s scale and translate settings to your new settings Redraws the map based on the updated settings 221Advanced mapping d3.select("#controls").append("button").on("click", function (){ zoomButton("out")}).html("Zoom Out"); With this kind of styling and interactivity in place, you can make a map for most any application. Zooming and panning is important for maps because users expect to be able to zoom in and out, and they also expect the details of the map to change when they do so. In that way, geospatial is one of the most powerful forms of information visualization because users have a high level of literacy when it comes to reading and interacting with maps. But users also expect a map to have certain features and func- tionality, and when those are missing they think it’s broken. Make sure that when you create your map, it either includes this functionality or you have a good reason to leave it out. 7.3 Advanced mapping We’ve covered the aspects of creating maps that you’ll likely end up using with all your maps. You could explore many variations. You may want to scale your ele- ments based on population, or use elements so that you can also provide labels like we did earlier. But if you’re making a map, it will probably have polygons and points and take advantage of bounding boxes or centroids, and will likely be tied to a zoom behavior. The exciting thing about D3 is that it lets you explore more complex ways of representing geography, with a little more effort. 7.3.1 Creating and rotating globes We’ll do only one thing in 3D in this entire book, and that’s create a globe. We don’t need to load three.js or learn WebGL. Instead, we’ll take advantage of a trick of one of the geographic projections available in D3: the orthographic projection, which ren- ders geographic data as it would appear from a distant point viewing the entire globe. We need to update our projection to refer to the orthographic projection and have a slightly different scale. projection = d3.geo.orthographic() .scale(200) .translate([width / 2, height / 2]) .center([0,0]); With this new projection, you can see what looks like a globe in figure 7.11. To make it rotate, we need to use d3.mouse, which returns the current position of the mouse on the SVG canvas. Pair this with event listeners to turn on and off a mouse- move listener on the canvas. This simulates dragging the globe, which we’ll use only to rotate it along the x-axis. Because we’re introducing new behavior and it’s been a while since we looked at the full code, the following listing has the entire code for cre- ating the globe. Listing 7.12 Creating a simple globe 222 CHAPTER 7 Geospatial information visualization queue() .defer(d3.json, "world.geojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); function createMap(countries, cities) { …code to set up orthographic projection… var mapZoom = d3.behavior.zoom().translate(projection.translate()).scale(projection.sc ale()).on("zoom", zoomed); d3.select("svg").call(mapZoom); var rotateScale = d3.scale.linear() .domain([0, width]) .range([-180, 180]); d3.select("svg").on("mousedown", startRotating).on("mouseup", stopRotating); function startRotating() { d3.select("svg").on("mousemove", function() { var p = d3.mouse(this); projection.rotate([rotateScale(p[0]), 0]); zoomed(); }); } function stopRotating() { d3.select("svg").on("mousemove", null); } Listing 7.13 A draggable globe in D3 Figure 7.11 An orthographic projection makes our map look like a globe. Notice that even though the paths for countries are drawn over each other, they’re still drawn above the graticules. Also notice that although zooming in and out works, panning doesn’t spin the globe but simply moves it around the canvas. The coloration of our countries is once again based on the graphical size of the country. Dragging globe requires an explicit mousemove event listener triggered by mousedown End of dragging requires clearing the mousemove listener 223Advanced mapping function zoomed() { var currentRotate = projection.rotate()[0]; projection.scale(mapZoom.scale()); d3.selectAll("path.graticule").attr("d", geoPath); d3.selectAll("path.countries").attr("d", geoPath); d3.selectAll("circle.cities") .attr("cx", function(d) {return projection([d.y,d.x])[0]}) .attr("cy", function(d) {return projection([d.y,d.x])[1]}) .style("display", function(d) {return parseInt(d.y) + currentRotate < 90 && parseInt(d.y) + currentRotate > -90 ? "block" : "none"}) } …code to add manual zoom and zoom buttons… …code to draw graticule, countries and cities… …code to create and clear center and bounding box… } A plugin by Jason Davies known as d3.geo.zoom (https://www.jasondavies.com/ maps/rotate/) abstracts this functionality. But this map still has the problem of a graphical artifact from the graticule out- line, which must be removed when drawing globes. Another problem is seeing through the globe to the other side. This might be a fine idea, if it didn’t also mud- dle the SVG drawing code so that the shapes are drawn poorly when they get near the border (notice how poorly Antarctica looks in figure 7.12). Also, our cities are drawn above the paths, even when they’re ostensibly on the other side of the world (for example, Karachi). Figure 7.12 A globe with a transparent surface. You can see Australia through the globe because the projection doesn’t by default clip this. Cities are drawn at the correct coordinates but are uniformly drawn above the features because the elements are drawn on top of the elements in the DOM. 224 CHAPTER 7 Geospatial information visualization The path drawing can be handled with the clipAngle property of the projection, which clips any paths drawn with that projection if they fall outside of a particular angle from its center. This can be useful to show only small parts of your dataset for performance or display purposes. Here’s how it looks in our new projection code: projection = d3.geo.orthographic() .scale(200) .translate([width / 2, height / 2]) .clipAngle(90); This won’t work for the circles we’re using for our cities, because clipAngle only applies to data that’s created by d3.geo.path(). For the circles, we have to ensure that they’re only displayed if they fall within that clip angle. Taking this into account, we can pass a test in the zoomed function to determine whether a city should be dis- played based on its coordinates. function zoomed() { var currentRotate = projection.rotate()[0]; projection.scale(mapZoom.scale()); d3.selectAll("path.graticule").attr("d", geoPath); d3.selectAll("path.countries").attr("d", geoPath); d3.selectAll("circle.cities") .attr("cx", function(d) {return projection([d.x,d.y])[0]}) .attr("cy", function(d) {return projection([d.x,d.y])[1]}) .style("display", function(d) { return parseInt(d.y) + currentRotate < 90 && parseInt(d.y) + currentRotate > -90 ? "block" : "none"; }); }; Listing 7.14 Hiding cities on the other side of a rotated globe Figure 7.13 Our rotating and properly clipped globe If this city’s y position is within 90 degrees of the current rotation of the globe, then display it; otherwise, hide it. 225Advanced mapping You may think you’re done, but there’s one related issue to address now. You draw all the countries when the globe is first initialized, but many of them are clipped, and so your geo.path.area() function, which determines the area as the shape is drawn, has even worse issues than the Mercator projection had. For instance, in figure 7.13, Australia is colored as if it had an area similar to Madagascar. Fortunately, D3 also includes d3.geo.area(), which determines the spherical area of a shape correspond- ing to its geographic area, as in figure 7.14. We could rewrite the draw code to use d3.geo.area, but instead let’s recolor our existing globe. But how do we get the data? Until now, we’ve assumed that the data array was exposed somewhere our functions could get to, but what if it’s outside our current scope? In this case, we can use selectAll.data() and get an array of data associated with whatever we select (which includes undefined elements if we select HTML elements that aren’t bound with data). You’ll see this in action more in the next chapter. var featureData = d3.selectAll("path.countries").data(); var realFeatureSize = d3.extent(featureData, function(d) {return d3.geo.area(d)}); var newFeatureColor = d3.scale.quantize().domain(realFeatureSize).range(Reds[7]); d3.selectAll("path.countries") .style("fill", function(d) {return newFeatureColor(d3.geo.area(d))}); The spherical area of a shape as measured by d3.geo.area() is given in steradians, and so it’s only a roughly proportionate area. If you want the actual square kilometers of a country or other shape, you’ll still need to calculate that in a GIS package like QGIS, or get that information from another source. Figure 7.14 Our globe with countries colored by their geographic area, rather than their graphical area 226 CHAPTER 7 Geospatial information visualization This globe still has some issues. Because you don’t update the projection.center(), and you base the rotation off the current position of the mouse, it resets any time you drag the globe. You also don’t clip the cities when you first draw them. Further, you can make a D3 globe drag in any of the three directions you can rotate a normal globe. But if you’re looking for that level of functionality, then you’re better off explor- ing the many and robust examples available online (such as those of Jason Davies at http://jasondavies.com/maps/voronoi/capitals/). Instead, we’ll look at another exotic way of representing geodata, the satellite projection. 7.3.2 Satellite projection Isometric views of the world are powerful tools for storytelling. Imagine you had to create a map related to how the Middle East has a changing view of Europe. By craft- ing a satellite view looking out over the Mediterranean from the Middle East as shown in figure 7.15, you invite your map reader to see a distant Europe from a geographical perspective in the Middle East. This is a projection just like the orthographic, Mercator, and Mollweide projec- tions we previously used, but, as you see in the following listing, it has specific settings for scale and rotate. It also uses new settings, tilt and distance, to determine the angle of the satellite projection. projection = d3.geo.satellite() .scale(1330) .translate([250,250]) .rotate([-30.24, -31, -56]) .tilt(30) .distance(1.199) .clipAngle(45); Listing 7.15 Satellite projection settings Figure 7.15 A satellite projection of data from the Middle East facing Europe The angle of the perspective on the geographic features The distance of the surface from your perspective 227TopoJSON data and functionality Tilt is the angle of the perspective on the data, while distance is the percentage of the radius of the earth (so 1.119 is 11.9% of the radius of the earth above the earth). How do you come up with such exact settings? You have two options. The first is to under- stand how to describe a tilted projection like this mathematically. If you have a degree in math or geography, you can look into literature for calculating this. If, like me, you don’t have that kind of background, then I would suggest building a tool, using the code we explored in this chapter, to adjust the rotation, tilt, distance, and scale set- tings interactively. That’s how I did it, and you can play with my satellite projection tool here: http://bl.ocks.org/emeeks/10173187. Recall my advice for understanding how the Sankey layout works. Use information visualization to visualize how the functions work so that you can better understand them and find the right settings. Otherwise, you’re going to need to take a course in GIS or wait for someone to write D3.js Mapping in Action. Now we’ll shift gears away from visualization and back to geodata structure to explore a library that was developed by Mike Bostock and is intimately tied to D3 map- ping: TopoJSON. 7.4 TopoJSON data and functionality TopoJSON (https://github.com/mbostock/topojson) is, fundamentally, three differ- ent things. First of all, it’s a data standard for geographic data, and an extension of GeoJSON. Secondly, it’s a library that runs in node.js to create TopoJSON-formatted files from GeoJSON files. Thirdly, it’s a library that runs in JavaScript that processes TopoJSON-formatted files to create the data objects necessary to render them with libraries like D3. You won’t deal with the second form at all, and you’ll only examine the first in a cursory manner as you learn about rendering TopoJSON data, merging it, and using it to find a feature’s neighbors. 7.4.1 TopoJSON the file format The difference between GeoJSON files and TopoJSON files is that while GeoJSON records for each feature an array of longitude and latitude points that describe a point, line, or polygon, TopoJSON stores for each feature an array of arcs. An arc is any distinct segment of a line shared by one or more features in your dataset. The shared border between the United States and Mexico is a single arc that’s referred to in the arcs array of the feature for the United States and the arcs array of the feature for Mexico. Because most datasets have shared segments, TopoJSON often produces signifi- cantly smaller datasets. This is part of its appeal. Another part is that if you know what segments are shared, then you can do interesting things with the data, like easily cal- culating the neighboring features or the shared border, or merging features. TopoJSON stores the arcs as a reference to a particular arc in a master list of arcs that defines the coordinates of that arc. You need the Topojson.js library included in any website you’re using to create maps with TopoJSON, because it changes TopoJSON into a format that D3 can read and create graphics from. 228 CHAPTER 7 Geospatial information visualization 7.4.2 Rendering TopoJSON Because TopoJSON stores its data in a format different from the GeoJSON structure that’s expected by d3.geo.path(), we need to include Topojson.js and use it to pro- cess TopoJSON data to produce GeoJSON features. This is rather straightforward and can be done in a call to our new datafile, as shown in the following listing. Figure 7.16 shows the properly formatted features in your console. queue() .defer(d3.json, "world.topojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); function createMap(file1, file2) { var worldFeatures = topojson.feature(file1, file1.objects.countries) console.log(worldFeatures); }; Now that it’s in the format we want, we can send it to our existing code and draw this array of features like we did with the features we loaded from world.geojson. We replace our earlier countries with the worldFeatures variable declared in list- ing 7.16. That’s all that most people do with TopoJSON, and they’re happy for it because TopoJSON data is significantly smaller than GeoJSON data. But because we know the topology of the features in a TopoJSON data file, we do interesting geo- graphic tricks with it. Listing 7.16 Loading TopoJSON Figure 7.16 TopoJSON data formatted using Topojson.feature(). The data is an array of objects, and it represents geometry as an array of coordinates like the features that come out of a GeoJSON file. Notice that our TopoJSON file has a property "objects", which all TopoJSON files have, but "countries" is specific to this file and might be "rivers" or "land" or other property names in other files. 229TopoJSON data and functionality 7.4.3 Merging The TopoJSON library provides you with the capacity to create new features by merg- ing existing features. You can create a new feature for “North America” by merging the countries in North America, or create “The United States in 1912” by merging the states that were part of the United States in 1912. Listing 7.17 shows the code to draw a map using our new TopoJSON data file and merge all the countries that have a cen- ter west of 0° longitude. The results, in figure 7.17, show that merging combines not only contiguous features but also separate features into a multipolygon. queue() .defer(d3.json, "world.topojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); function createMap(topoCountries, cities) { var countries = topojson.feature(topoCountries, topoCountries.objects.countries); var width = 500; var height = 500; var projection = d3.geo.mollweide() .scale(120) .translate([width / 2, height / 2]) .center([20,0]); var geoPath = d3.geo.path().projection(projection); var featureSize = d3.extent(countries.features, function(d) {return geoPath.area(d)}); Listing 7.17 Rendering and merging TopoJSON Figure 7.17 The results of merging based on the centroid of a feature. The feature in gray is a single merged feature made up of many separate polygons. 230 CHAPTER 7 Geospatial information visualization var countryColor = d3.scale.quantize() .domain(featureSize).range(colorbrewer.Reds[7]); var graticule = d3.geo.graticule(); d3.select("svg").append("path") .datum(graticule) .attr("class", "graticule line") .attr("d", geoPath) .style("fill", "none") .style("stroke", "lightgray") .style("stroke-width", "1px"); d3.select("svg").append("path") .datum(graticule.outline) .attr("class", "graticule outline") .attr("d", geoPath) .style("fill", "none") .style("stroke", "black") .style("stroke-width", "1px"); d3.select("svg").selectAll("path.countries") .data(countries.features) .enter() .append("path") .attr("d", geoPath) .attr("class", "countries") .style("fill", function(d) {return countryColor(geoPath.area(d))}) .style("stroke-width", 1) .style("stroke", "black") .style("opacity", .5); d3.select("svg").selectAll("circle").data(cities) .enter() .append("circle") .style("fill", "black") .style("stroke", "white") .style("stroke-width", 1) .attr("r", 3) .attr("cx", function(d) {return projection([d.x,d.y])[0];}) .attr("cy", function(d) {return projection([d.x,d.y])[1];}); mergeAt(0); function mergeAt(mergePoint) { var filteredCountries = topoCountries.objects.countries.geometries .filter(function(d) { var thisCenter = d3.geo.centroid( topojson.feature(topoCountries, d) ); return thisCenter[1] > mergePoint? true : null; }); d3.select("svg").insert("g", "circle") .datum(topojson.merge(topoCountries, filteredCountries)) .insert("path") .style("fill", "gray") .style("stroke", "black") After processed by Topojson.features, we use exactly the same methods to render the features. Our merge function We’re working with the TopoJSON dataset. To use geo.centroid, we convert each feature into GeoJSON. Results in an array of only the corresponding geometries Uses datum because merge returns a single multipolygon 231TopoJSON data and functionality .style("stroke-width", "2px") .attr("d", geoPath); }; }; We can adjust the mergeAt test slightly to look at the x coordinate or to see features that have greater values of mergeAt. As shown in figure 7.18, this creates a single feature in each of four cases: less than or greater than 0° latitude and less than or greater than 0° longitude. Notice in each case that it’s a single feature but not a single polygon. A quick note for those who may want to continue working in topologies: Topo- json.merge has a sister function, mergeArcs, that allows you to merge shapes but keep them in TopoJSON format. Why would you want to maintain arcs? Because then you could continue to use TopoJSON functionality like merging, creating meshes, or find- ing neighbors of your newly merged features. Figure 7.18 By adjusting the merge settings, we can create something like northern, southern, eastern, and western hemispheres as merged features. Notice that because this is based on a centroid, we can see at the bottom left a piece of Eastern Russia as part of our merged feature, along with Antarctica. 232 CHAPTER 7 Geospatial information visualization 7.4.4 Neighbors Because we know when features share arcs, we also know what features neighbor each other. The function Topojson.neighbors builds an array of all the features that share a border. We can use this array to easily identify neighboring countries in our dataset using the code in the following listing. The results of the interaction provided by this code are shown in figure 7.19. var neighbors = topojson.neighbors(topoCountries.objects.countries.geometries); d3.selectAll("path.countries") .on("mouseover", findNeighbors) .on("mouseout", clearNeighbors); function findNeighbors (d,i) { d3.select(this).style("fill", "red"); d3.selectAll("path.countries") .filter(function (p,q) {return neighbors[i].indexOf(q) > -1}) .style("fill", "green") }; function clearNeighbors () { d3.selectAll("path.countries").style("fill", "gray"); }; TopoJSON is a powerful new technology that provides tremendous opportunity for web map development. Understanding how it models data and the functionality that it provides are key to creating maps that impress users. As you explore traditional web Listing 7.18 Calculating neighbors and interactive highlighting Figure 7.19 Hover behavior displaying the neighbors of France using TopoJSON’s neighbor function. Because Guyana is an overseas department of France, France is considered to be neighbors with Brazil and Suriname. This is because France is represented as a multipolygon in the data, and any neighbors with any of its shapes are returned as neighbors. Creates an array indicating neighbors by their array positionColors the country you hover over red Colors all neighbors green Colors all countries gray to "clear" results 233Tile mapping with d3.geo.tile tile mapping, you’ll see that you can combine more traditional web mapping tech- niques with the advanced functionality provided by TopoJSON and D3’s geo functions to make incredibly sophisticated web maps. 7.5 Tile mapping with d3.geo.tile So far you’ve made choropleth maps, some of which are simple and others, like the satellite projection or the globe, rather exotic. But none of your maps have terrain, or satellite imagery. That kind of data—raster or image data—isn’t nearly as lightweight as vector data. Think about the size of a picture you take with the camera on your phone, and imagine how large an image must be if you want to give your user the ability to zoom in to any street in the world. To get around the problem of these massive images, web mapping uses tiles to dis- play satellite and terrain data. A high-resolution satellite image of a city, for instance, would be cut into 256- by 256-px tiles at as many zoom levels as are appropriate and stored on a server in directories indicating the zoom and position of those tiles. It sounds like it might be a lot of work to make tiles, but fortunately, you don’t have to, because companies like Mapbox (mapbox.com) provide you with tiles and the tools, like TileMill, to customize them. (Both free and commercial versions are available, depending on how many visitors your site receives.) If you open up tile.js and take a look at it, you’ll see that it’s a small file. That’s because geotiles are simple. Each tile is a raster image (typically a PNG) that represents one square of the earth somewhere, as you see in figure 7.20. Its filename indicates the Figure 7.20 Your first tiled map, using pregenerated tiles from Mapbox 234 CHAPTER 7 Geospatial information visualization geographic location and at what zoom level the image shows. The d3.geo.tile() func- tion (the library to access this function is available at https://github.com/d3/d3-plugins/ tree/master/geo/tile) parses that filename and directory structure for us so that we can use these tiles in our map. First, though, we have to calibrate the scale and trans- late of our projection as well as our zoom behavior. var width = 500, height = 500; d3.select("svg").append("g").attr("id", "tiles"); var tile = d3.geo.tile() .size([width, height]); var projection = d3.geo.mercator() .scale(120) .translate([width / 2, height / 2]); var center = projection([12, 42]); var path = d3.geo.path() .projection(projection); var zoom = d3.behavior.zoom() .scale(projection.scale() * 2 * Math.PI) .translate([width - center[0], height - center[1]]) .on("zoom", redraw); d3.select("svg").call(zoom); redraw(); function redraw() { var tiles = tile .scale(zoom.scale()) .translate(zoom.translate())(); var image = d3.select("#tiles") .attr("transform", "scale(" + tiles.scale + ") translate(" + tiles.translate + ")") .selectAll("image") .data(tiles, function(d) { return d; }); image.exit() .remove(); image.enter().append("image") .attr("xlink:href", function(d) { return "http://" + ["a", "b", "c", "d"][Math.random() * 4 | 0] + ".tiles.mapbox.com/v3/examples.map-zgrqqx0w/" + d[2] + "/" + d[0] + "/" + d[1] + ".png"; }) .attr("width", 1) .attr("height", 1) .attr("x", function(d) { return d[0]; }) .attr("y", function(d) { return d[1]; }); }; Listing 7.19 A tile map A group to keep our tiles behind any other drawn features The function we use to create your tiles The dataset we use to create the images Generates proper transform settings based on the current zoom Binds the tiles data to svg:image elements Removes any that are offscreen Appends new ones Path to the tiles is generated by tile.js for services like Mapbox 235Tile mapping with d3.geo.tile We’ll want to add our points and polygons to this map. The code to do that isn’t very different from the code you saw in listing 7.19 and the code we’ve been working with throughout the chapter. We’ll use the same data, but add a function on the display styling of the countries to make half of them disappear. You can see the results in fig- ure 7.21. queue() .defer(d3.json, "world.topojson") .defer(d3.csv, "cities.csv") .await(function(error, file1, file2) { createMap(file1, file2); }); function createMap(topoCountries, cities){ var countries = topojson.feature(topoCountries, topoCountries.objects.countries); var width = 500, height = 500; d3.select("svg").append("g").attr("id", "tiles"); var tile = d3.geo.tile() .size([width, height]); Listing 7.20 A tile map with vector data overlaid Figure 7.21 A tile map overlaid with the point and polygon data we worked with throughout this chapter 236 CHAPTER 7 Geospatial information visualization var projection = d3.geo.mercator() .scale(120) .translate([width / 2, height / 2]); var center = projection([12, 42]); var path = d3.geo.path() .projection(projection); var featureSize = d3.extent(countries.features, function(d) { return path.area(d); }); var countryColor = d3.scale.quantize() .domain(featureSize) .range(colorbrewer.Reds[7]); var zoom = d3.behavior.zoom() .scale(projection.scale() * 2 * Math.PI) .translate([width - center[0], height - center[1]]) .on("zoom", redraw); d3.select("svg").call(zoom); redraw(); d3.select("svg").selectAll("path.countries").data(countries.features) .enter() .append("path") .attr("d", path) .attr("class", "countries") .style("fill", function(d) {return countryColor(path.area(d))}) .style("stroke-width", 1) .style("stroke", "black") .style("opacity", .5) d3.select("svg").selectAll("circle").data(cities) .enter() .append("circle") .attr("class", "cities") .attr("r", 3) .attr("cx", function(d) { return projection([d.x,d.y])[0]; }) .attr("cy", function(d) { return projection([d.x,d.y])[1]; }); function redraw() { var tiles = tile .scale(zoom.scale()) .translate(zoom.translate()) (); var image = d3.select("#tiles") .attr("transform", "scale(" + tiles.scale + ")translate(" + tiles.translate + ")") .selectAll("image") .data(tiles, function(d) { return d; }); 237Further reading for web mapping image.exit() .remove(); image.enter().append("image") .attr("xlink:href", function(d) { return "http://" + ["a", "b", "c", "d"][Math.random() * 4 | 0] + ".tiles.mapbox.com/v3/examples.map-zgrqqx0w/" + d[2] + "/" + d[0] + "/" + d[1] + ".png"; }) .attr("width", 1) .attr("height", 1) .attr("x", function(d) { return d[0]; }) .attr("y", function(d) { return d[1]; }); projection .scale(zoom.scale() / 2 / Math.PI) .translate(zoom.translate()); d3.selectAll("path.countries") .attr("d", path); d3.selectAll("circle").attr("cx", function(d) { return projection([d.x,d.y])[0]; }) .attr("cy", function(d) { return projection([d.x,d.y])[1]; }); }; }; 7.6 Further reading for web mapping As I said in the beginning of this chapter, the things you can do with D3’s mapping capabilities would fill an entire book. Following are a few other capabilities we didn’t cover in this chapter. 7.6.1 Transform zoom The method we used for our zoom behavior in this chapter is known as projection zoom and recalculates mathematically the shape of features based on a change in scale and translation. But if you’re using a projection that’s flat like Mercator, then you can achieve faster performance by tying the change in scale and translate of the zoom behavior to your features’ SVG transform. One issue you’ll run into is that font size and stroke width are affected by SVG transform, and so you’ll need to adjust those set- tings on the fly. 7.6.2 Canvas drawing The .context function d3.geo.path allows you to easily draw your vector data to a element, which can dramatically improve speed in certain cases. It also allows you to use .toDataURL() to dynamically create a PNG for users to save or share on social media. Note that we’re not taking zoom.scale() directly like we did before, but processing it to get the properly formatted scale for our Mercator projection. 238 CHAPTER 7 Geospatial information visualization 7.6.3 Raster reprojection Jason Davies and Mike Bostock have both provided examples of reprojecting, not just vector data, but the tile data used in tile maps (see bl.ocks.org/mbostock/ and www.jasondavies.com/maps/raster/satellite/). You can use this to show a satellite- projected terrain map, or a terrain map with the Mollweide projection we used earlier. 7.6.4 Hexbins The d3.hexbin plugin allows you to easily create hexbin overlays for your maps like that seen in figure 7.22. This can be effective when you have quantitative data in point form and you want to aggregate it by area. 7.6.5 Voronoi diagrams As with hexbins, if you only have point data and want to create area data from it, you can use the d3.geom.voronoi function to derive polygons from points like the kind seen in figure 7.23. 7.6.6 Cartograms Distorting the area or length of a geographic object to show other information creates a cartogram. For example, you could distort the streets of your city based on the time it takes to drive along them, or make the size of countries on a world map bulge or shrink based on population. Although no simple functions exist to create cartograms, examples of how to create them in D3 include one created by Jason Davies (http:// www.jasondavies.com/maps/dorling-world/), one created by Mike Bostock (http:// bl.ocks.org/mbostock/4055908), and the cost cartogram I built (orbis.stanford.edu). Figure 7.22 An example of hexbinning by Mike Bostock showing the locations of WalMart stores in the United States (available at http://bl.ocks.org/mbostock/4330486). 239Summary 7.7 Summary In this chapter, we’ve covered the incredible breadth of geospatial information visual- ization capabilities present in D3. Maps are a core aspect of information visualization, and the creation of rich interactive websites and D3’s geo functions allow you to make maps that are much richer than the pushpin web maps that you typically see on the web. To make those maps, we walked through a massive amount of functions and con- cepts, including ■ Understanding the GeoJSON spatial data format ■ Creating simple maps ■ Creating map components like graticules ■ Computing geospatial attributes like centroids and bounding boxes ■ Giving the user rich interactive panning and zooming ■ Using different projections ■ Creating globes ■ Rendering TopoJSON and using it to merge features and find neighbors ■ Creating tile maps with TopoJSON overlays. In the next chapter, you’ll start using D3 selections and data-binding to create galleries and tables using traditional DOM elements. Figure 7.23 An example of a Voronoi diagram used to split the United States into polygons based on the closest state capital (available at http://www.jasondavies.com/maps/voronoi/us-capitals/). 240 Traditional DOM manipulation with D3 Many introductions to D3 start with sizing
elements to create a bar chart. They figure you’re a web developer and that you won’t be as intimidated by a div as you’d be by an SVG rectangle. This book even begins by creating a set of

ele- ments in chapter 1, the first time you saw data-binding in action. But then these tutorials (and this book) quickly transition into the creation of SVG elements, with an emphasis on the graphical display of information. This is at odds with tradi- tional web development, which focuses on the presentation of blocks of text, images, buttons, lists, and so on. As a result, it seems like D3 is for data visualization, but somehow not for manipulating traditional DOM elements like paragraphs, divs, and lists (like those seen in figure 8.1). The benefit of using D3 to create these kinds of elements is that you can use D3 transitions, data-binding, and other func- tionality to make a more interactive and dynamic website. This chapter covers ■ Making spreadsheets with data ■ Drawing graphics with HTML5 canvas ■ Building image galleries with data ■ Populating drop-down lists with data-binding 241Setup The principles at work in D3 not only can be used for traditional DOM elements, but in many cases should be used that way. In this chapter, we’ll use D3 to create a spreadsheet as well as an image gallery. We’ll also explore how to use HTML5 canvas to draw and save images. This won’t include an exhaustive example of canvas, because that’s beyond the scope of the book, but it’ll give you the fundamentals to deploy it in tan- dem with D3 in your applications. In each case, we’ll use D3 data-binding, transitions, and selections the same way we did to make charts, but instead make more traditional HTML elements. By using the same datasets and functions to deal with your DOM elements as you do with your SVG elements, you make it easier to tie them together and reduce the amount of syntax you need to learn to deploy rich sites. In later chapters, you’ll see these different methods of presenting information working in together in tandem. 8.1 Setup As you may expect, we need to make a few changes to the files we’re working with, now that we’re going to do coding that resembles more traditional web development. In one case, this means simplifying, because our HTML page loses the element necessary for representing SVG graphics, but in another sense it means making things more complex. Although we used CSS primarily for graphical changes with SVG, we need to use it for more than that when working with traditional DOM elements. Figure 8.1 The traditional DOM-based pieces that are created in this chapter are a spreadsheet (section 8.2) and an image gallery (section 8.4), with interactivity based on a data-driven drop-down list (section 8.4.2) and images drawn using HTML5 canvas (8.3). 242 CHAPTER 8 Traditional DOM manipulation with D3 8.1.1 CSS You use more CSS when you work with traditional DOM elements, because if you want to manipulate them in the way you manipulate SVG elements, you typically need to set them up a bit differently; for instance, if you want to place HTML elements precisely like you do with SVG elements. Also, most of the graphical aspects of these elements aren’t set with attributes like in SVG, but with styles (we covered the difference between styles, attributes, and properties back in chapter 1). This shouldn’t be a surprise for anyone who’s had experience working with CSS, because it’s usually the case in the complex examples and under the hood when you use JavaScript libraries. For exam- ple, if you look at the CSS of various libraries that provide autocomplete or more sophisticated UI elements, you’ll see that they typically combine JavaScript with a vari- ety of styles assigned to complex CSS selectors. In the following listing you’ll see the style sheet we’ll use for this chapter. Some of these elements, like , you won’t see until the end of the chapter. tr { border: 1px gray solid; } td { border: 2px black solid; } div.table { position:relative; } div.data { position: absolute; width: 90px; padding: 0 5px; } div.head { position: absolute; } div.datarow { position: absolute; width: 100%; border-top: 2px black solid; background: white; height: 35px; overflow: hidden; } div.gallery { position: relative; } img.infinite { position: absolute; Listing 8.1 Style sheet for chapter 8 243Spreadsheet background: rgba(255,255,255,0); border-width: 1px; border-style: solid; border-color: rgba(0,0,0,0); } 8.1.2 HTML The HTML is pretty simple: a single

with the ID value of "traditional" in your element, as shown in the following listing. You still need a reference to d3.js, but otherwise it’s a Spartan HTML page. You’ll either modify or add new elements to that div for every example.
8.2 Spreadsheet Let’s assume we want to take the tweets data that we’ve been working with throughout the book and present it as a spreadsheet. It may help to first think of spreadsheets as a kind of information visualization. They have an x-axis (columns) and a y-axis (rows) and visual channels to express information (not only color applied to text and cells but also position and font styling). This is especially true of large spreadsheets, because they also use aggregated functions to tally results. 8.2.1 Making a spreadsheet with table The easiest way to make a spreadsheet is to use the HTML element and data- binding to create rows and cells. As we’ve done previously, we create key values by using d3.keys on one of the entries in our dataset, which will be the venerable tweets.json. After we bind the dataset to the table, we need to create individual cells. We can accom- plish this by taking each JSON object and applying d3.entries() to it, which turns an object into an array of key-value pairs perfectly suited for D3 data-binding. d3.json("tweets.json",function(error,data) { createSpreadsheet(data.tweets)}); function createSpreadsheet(incData) { var keyValues = d3.keys(incData[0]); Listing 8.2 chapter8.html Listing 8.3 Spreadsheet example This won’t work if your objects have differing numbers of attributes, but usually that’s not the case. 244 CHAPTER 8 Traditional DOM manipulation with D3 d3.select("#traditional") .append("table"); d3.select("table") .append("tr") .attr("class", "head") .selectAll("th") .data(keyValues) .enter() .append("th") .html(function (d) {return d;}); d3.select("table") .selectAll("tr.data") .data(incData).enter() .append("tr") .attr("class", "data"); d3.selectAll("tr") .selectAll("td") .data(function(d) {return d3.entries(d)}) .enter() .append("td") .html(function (d) {return d.value}); }; The result of listing 8.3 is a decent tabular presentation of our tweets data, as shown in figure 8.2. Notice that the arrays have been transformed into comma-delimited strings. It’s a simple task to take data and bind it to create traditional DOM elements in the same way we bound data to create SVG elements. We could have created an
    ele- ment and appended
  1. elements to it from our dataset just as easily. We can also use D3’s .on function to assign event listeners to highlight cells or rows by changing their background or font color. But rather than do that with a spreadsheet built using
, we’ll build another spreadsheet entirely out of
elements. Creates our header row from our keys Creates each row for a tweet Creates each cell for an entry in each datapoint Figure 8.2 A tabular display of the data found in tweets.json using
, , and
elements 245Spreadsheet 8.2.2 Making a spreadsheet with divs Why use
elements? Because we’re going to start moving our cells and rows around however we want, and by the time we override all the styles that make a table and its con- stituent elements work, we’re better off starting fresh with a div. By setting the
posi- tion to absolute, we can use D3 transitions to move them around in the same way we moved SVG around in our earlier examples. We need to apply a bit more CSS to make the
elements take up the right amount of space, whereas does that for us, but the added flexibility is worth it. A quick note for those of you who, like me, always forget the one crazy rule of positioning DOM elements: elements set to position:relative need to have a parent set to position:relative or position:absolute. We’ll create a parent
(div.table) with position:relative to hold the
elements that make up our table. d3.json("tweets.json",function(error,data) { createSpreadsheet(data.tweets)}); function createSpreadsheet(incData) { var keyValues = d3.keys(incData[0]); d3.select("#traditional") .append("div") .attr("class", "table"); d3.select("div.table") .append("div") .attr("class", "head") .selectAll("div.data") .data(keyValues) .enter() .append("div") .attr("class", "data") .html(function (d) {return d}) .style("left", function(d,i) {return (i * 100) + "px";}); d3.select("div.table") .selectAll("div.datarow") .data(incData, function(d) {return d.content}) .enter() .append("div") .attr("class", "datarow") .style("top", function(d,i) {return (40 + (i * 40)) + "px";}); d3.selectAll("div.datarow") .selectAll("div.data") .data(function(d) {return d3.entries(d)}) .enter() .append("div") .attr("class", "data") .html(function (d) {return d.value}) .style("left", function(d,i,j) {return (i * 100) + "px";}); }; Listing 8.4 A spreadsheet made of divs It’s a , not a
. Same as before Instead of x/y or transform, HTML elements have top/ bottom/left/ right 246 CHAPTER 8 Traditional DOM manipulation with D3 This code has some obvious oversimplifications. As shown in figure 8.3, it doesn’t make much sense to have each column the same width. Although we could create a method for measuring the maximum size of the text in that field, that’s not where we’ll go in this chapter. I want to show a general overview of manipulating elements like these rather than create the ultimate D3 spreadsheet. 8.2.3 Animating our spreadsheet It’s time now to add interactivity to the static chart shown in figure 8.3. One tradi- tional interaction technique applied to spreadsheets is the ability to sort them. We can do that with our spreadsheet by sorting the data and rebinding it to the cells, just like we did previously with SVG elements. By tying this to the same transition() behavior we used before, we can also animate that sorting. d3.select("#traditional").insert("button", ".table") .on("click", sortSheet).html("sort"); d3.select("#traditional").insert("button", ".table") .on("click", restoreSheet).html("restore"); function sortSheet() { var dataset = d3.selectAll("div.datarow").data(); dataset.sort(function(a,b) { var a = new Date(a.timestamp); var b = new Date(b.timestamp); Listing 8.5 Sorting functions Figure 8.3 Our improved spreadsheet built with
elements. You can see how each div is the same width. Because of our overflow settings, it displays as much of the text as it can. Simple controls for our spreadsheet Casts as date and sorts the array so that earlier tweets are lower in the array 247Spreadsheet return a>b ? 1 : (a Listing 8.7 Canvas drawing code Figure 8.6 A circle and text drawn using HTML5 canvas 250 CHAPTER 8 Traditional DOM manipulation with D3 element, but on a element that needs to be created in the DOM. Third, can- vas has a syntax distinct from SVG. But there’s one more major difference between the graphics created using canvas and the graphics created using SVG. You can see it if you inspect that circle, as shown in figure 8.7. Anything drawn in canvas is drawn to a bitmap, so you don’t have an individual text or circle element that you could assign an event listener to, or whose appearance or text content you can later modify. It’s also not vector-based, so if you try to zoom the image, you’ll see the pixilation you’re familiar with from zooming photos and other raster imagery. Because HTML5 canvas doesn’t create separate DOM ele- ments, it benefits from higher performance when dealing with large amounts of those graphical elements. But you lose the flexibility of SVG. 8.3.2 Drawing and storing many images We want images because our plan is to build an image gallery, but the canvas element in the DOM doesn’t act like the kind of image that you’re accustomed to dealing with in web development. You can’t right-click and save it, or open it in a new window in its current form. But the element includes a .toDataURL() function that pro- vides a string designed to be the src attribute of an element. You can see in the following listing the results of .toDataURL()when applied to one of your drawn circles. This is only the first three lines—the actual value would go on like this for nine pages. Figure 8.7 Any graphics created in canvas are stored as a bitmap or raster image. Unlike in SVG, the individual shapes are no longer accessible or modifiable after being drawn. 251Canvas  4Xu2dC3xV1ZX/171B1JJggNoSsSY+QrWiQnB4dCoEH7Tgg4dVdNRCWg3SqQVm+i99TIfQmc7UPkbU 9sNDW0KVWluFYCl0FIdAW99AALWWUE2sCtWCgQQfkdz73+smV1NIyH3sc89ae//258MnKOf In our new example in the following listing, we create 100 circles of varying colors with varying borders. We then use .toDataURL to create an array of values that can be bound to elements to create our first gallery of one hundred images. imageArray = []; d3.select("#traditional").append("canvas") .attr("height", 500).attr("width", 500); var context = d3.select("canvas").node().getContext("2d"); context.textAlign = "center"; context.font="200px Georgia"; colorScale = d3.scale.quantize().domain([0,1]) .range(colorbrewer.Reds[7]); lineScale = d3.scale.quantize().domain([0,1]).range([10,40]); for (var x=0;x<100;x++) { context.clearRect(0,0,500,500); context.strokeStyle = colorScale(Math.random()); context.lineWidth = lineScale(Math.random()); context.fillStyle = colorScale(Math.random()); context.beginPath(); context.arc(250,250,200,0,2*Math.PI); context.fill(); context.stroke(); context.fillStyle = "black"; context.fillText(x,250,280); var dataURL = d3.select("canvas").node().toDataURL(); imageArray.push({x: x, url: dataURL}); } d3.select("#traditional") .append("div").attr("class", "gallery") .selectAll("img").data(imageArray) .enter().append("img") .attr("src", function(d) {return d.url}) .style("height", "50px") .style("float", "left"); As shown in figure 8.8, each of our slightly different circles is turned into a PNG and assigned to an element. We can also use toDataURL() to create JPEGs by specify- ing that format, but by default it creates PNGs. Because they’re elements now, they resize automatically. Even though we only specified the height of the images, the element by default proportionately scaled the width of the image so that it wouldn’t distort. Because of the float:left setting on those elements, they easily fill Listing 8.8 Sample toDataURL() output Listing 8.9 Drawing 100 circles with canvas These scales are designed for random numbers to create random graphics. Draws a randomly colored circle 100 times Gets the data URL for each drawing and pushes it into an array Uses that array to create 100 images elements have automatic resizing, so the width of the image automatically adjusts to scale the image to this height without distorting. 252 CHAPTER 8 Traditional DOM manipulation with D3 the div we created for them. And because it’s an , we can do anything with these that we normally could with an image on a web page, including save them or open them in a new tab. That’s not much of an image gallery, though. We’ll continue to expand on this code in the next section, and also make something a bit more interesting by taking advantage of the interaction and animation techniques we’ve already used. 8.4 Image gallery You’ve spent time learning canvas so that you could make image elements for a gal- lery. When spec’ing out an image gallery, keep in mind a few features that everyone wants. First, you want more control over where you place images. Instead of using float, we’ll do the same thing we did with the spreadsheet divs in section 8.2.2 and use position:absolute along with top: and left: to place them like our div cells and rows or the SVG elements that we used in previous chapters. You also want images to Figure 8.8 The final canvas-drawn circle (top) remains in our element, and every variation according to the settings as an image in a div (bottom). 253Image gallery cleanly fit the space you provide, and you want those images to grow or shrink if the user changes the size of the window. For all these examples, we’ll use the same method described in listing 8.9 to create the imageArray dataset that we’ll use. The figures in this chapter will have slight varia- tion from the results of running this code, because we randomly generate some of the visual elements. We can create our first gallery with surprisingly little code. imgPerLine = 8; d3.select("canvas").remove(); d3.select("#traditional") .append("div").attr("class", "gallery") .selectAll("img").data(imageArray).enter().append("img") .attr("class", "infinite") .attr("src", function(d) {return d.data;}); redrawGallery(); function redrawGallery() { var newWidth = parseFloat(d3.select("div.gallery") .node().clientWidth); var imageSize = newWidth / imgPerLine; function imgX(x) { return Math.floor(x / imgPerLine) * imageSize; }; function imgY(x) { return Math.floor(x%imgPerLine * imageSize); }; d3.selectAll("img") .style("width", newWidth / imgPerLine) .style("top", function(d) {return imgX(d.x)}) .style("left", function(d) {return imgY(d.x)}) }; window.onresize = function(event) { redrawGallery(); }; As shown in figure 8.9, this produces a scrollable div with eight images per line. The images not only scale to fit the div, but rescale as you adjust your browser window. Listing 8.10 Resizing eight-image gallery Figure 8.9 Automatically scaled-to-fit images that pack eight images per row Resizes automatically to fit any number of images per row Deletes the canvas element because it’s not needed anymore Placement code in a separate function for ease of use with dynamic updates Size based on the parent div width x and y based on custom accessor functions Resizes the gallery whenever the page is resized 254 CHAPTER 8 Traditional DOM manipulation with D3 The imgX and imgY functions create an object for each image that stores an x value. This should remind you of D3 layout accessor functions. We’ll build something more involved like this in chapter 9 and dive into writing layouts in chapter 10, but for this example we won’t try to create an image gallery layout. 8.4.1 Interactively highlighting DOM elements From here, we can add interactivity, such as making an image expand on mouseover. The process is rather simple. function highlightImage(d) { var newWidth = parseFloat(d3.select("div.gallery") .node().clientWidth); var imageSize = newWidth / 8; d3.select(this).transition().duration(500) .style("width", imageSize * 2) .style("background", "rgba(255,255,255,1)") .style("border-color", "rgba(0,0,0,1)"); this.parentNode.appendChild(this) }; function dehighlightImage(d) { var newWidth = parseFloat(d3.select("div.gallery") .node().clientWidth); var imageSize = newWidth / 8; d3.select(this).transition().duration(500) .style("width", imageSize) .style("background", "rgba(255,255,255,0)") .style("border-color", "rgba(0,0,0,0)"); }; d3.selectAll("img") .on("mouseover", highlightImage) .on("mouseout", dehighlightImage); If you’re a savvy web developer, you’ve probably spotted an artifact from working with SVG in the code above. It’s the appendChild trick that we need to use to make SVG ele- ments draw in front of each other. We’re using relative and absolute positioned DOM elements, so we don’t need this, because CSS has a z-index that allows elements to be drawn in front of each other. But I wanted to keep appendChild to remind you that working with traditional DOM elements has benefits that SVG elements don’t. Another reason to use the DOM rather than a z-index for positioning is to high- light the array position value in the accessor functions in D3. You may think that the array position corresponds to the array position of a datapoint in the original JavaScript array that we bound to the selection, but it doesn’t. It corresponds to the array position of the DOM element in the selection. When you start to use append- Child to shift elements up and down the DOM, you change that array value. When we first created imageArray, we set the x value equal to the original array position, and Listing 8.11 Expand image on mouseover We have to recalculate the width, because that value isn’t accessible in this function. Moves the image up the DOM to ensure it’s drawn above the images around it We don’t move the image back in the DOM, because it can’t overlap when it’s reduced in size. 255Image gallery didn’t use array position to place the individual gallery images. This is why redraw- Gallery keeps drawing images in the right place, even after we start shifting images around in the DOM by mousing over them. When you run the code in listing 8.11, D3’s transitions are smart enough to process the rgba string designating a transparent background, as shown in figure 8.10. In some cases, like the next example, you may have to use D3’s tweening capabilities to make sure that a DOM element interpolates properly. It probably doesn’t follow the rules that make shape and color transitions work so easily. Still, with color and simple size transitions, you can use exactly the same code for
elements that we used with and elements, unless you’re trying to transition to "height:auto" or some other nonnumerical value. 8.4.2 Selecting Our final example adds a drop-down list to select a particular image and scroll the gal- lery to the row that holds that image. To do so, we need to populate the element works (it has a bunch of
,