by Matthew Heusser

Rapid Application Development the Zappos Way

How-To
Jun 07, 20129 mins
Agile DevelopmentConsumer ElectronicsData and Information Security

Billion-dollar online shoe store Zappos, and CEO Tony Hsieh, embrace the motto of Delivering Happiness. Living up to that means maintaining a reliable website and giving employees free reign to solve problems. CIO.com recently visited Henderson, Nev. to see how Zappos develops software.

On a recent Thursday, at the Zappos all-hands meeting at the Smith Center in downtown Las Vegas, I see the Happiness Mobile—a bus outfitted in bold blue colors that promises to “bring the wow.” The next day, when a staffer picks me up for a ride to corporate headquarters in Henderson, a bright blue SUV makes the same offer.

Zappos Happiness Bus
The Zappos bus delivers happiness and brings the wow.

It’s not a huge surprise. The core mission of Zappos is to deliver happiness—or, as CEO Tony Hsieh likes to put it, “Zappos is a customer service company that happens to sell shoes.”

A lot of them, as it turns out. Zappos sold more than $1 billion in shoes last year. More than two years after Amazon acquired Zappos, I suspect it ranks among the fastest-growing subdivisions of the online retail giant.

What does all that mean for the IT shop?

That question brought me to Nevada to find out.

A Service Ethic from Top to Bottom

Zappos headquarters is a 15-minute drive from the bright lights of Las Vegas. When I arrive, Amelia Smith, an intern in the development group, meets me at the door. Smith started as an administrative assistant but thought she could do more. Management agreed. Now she’s taking computer science courses at night, shepherding builds and deployments during the day and, apparently, giving the occasional company tour.

The company offers free tours every day, but my driver was a few minutes late that day and I missed the official full tour. Instead of making me wait 10 minutes for the next one, Smith offers the tour herself.

This is not a symbolic exercise. Everyone at Zappos is similarly invested in the company. Every new hire, from the janitor to the CFO, spends four weeks in phone training when he or she begins work. This training is mandatory, and there has never been an exception. The phone training has a second purpose—every employee, including Hsieh, will spend at least 10 hours staffing the phones during the Christmas season.

Employee Freedom Means Greater Empowerment, Rapid Application Development

Technical staff work in cubicles of roughly the same size, but they are decorated in radically different ways. Streamers, balloons, plastic tchotchkes and other decorations surround the place. The attitude is jovial; spontaneous, random parades of employees happen for no apparent reason.

I ask Christa Foley, a senior HR manager, if Zappos should “standardize” the workplace per to the 5S method espoused in lean software development. That was my way of making a joke; the 5S reference is a common misapplication of lean ideas to software.

Foley suddenly becomes very serious, forces a smile, and says, “That may work for other companies, but it wouldn’t work here.” This is a company serious about freedom.

Further flouting conventional wisdom, Zappos does not give any scripts or chains of instruction to representatives in its call center; instead, reps are empowered to make things right. The company’s social media policy? Be real, and use your best judgment.

Dean Curtis, a manager of the backend software catalog group, tells me that even the desks won’t be standard. When the company moves its headquarters in about a year, office equipment will be mobile. If a programmer needs to embed with the merchandising team, or if a tester needs face time with the programmers, he’ll be able to pick up his cubicle and move it.

When these folks say they want to move to a more integrated environment, they mean it.

Balancing Liberty with Policy, Compliance

For its light-heartedness, Zappos possesses some pretty secure and private data—user accounts, passwords, credit card numbers and other personal identifiable information. How can the company maintain its freewheeling culture and PCI compliance at the same time? I turned to Curtis for an answer.

Curtis explains that the “core site” program went through a rigorous quality control process, including formal QA, signoffs and review meetings. All processes, by default, have to go to a weekly approval meeting.

Obviously, this slows the projects down. However, not all teams have to go through this process. If a team can demonstrate that it is isolated from the core site, it can interoperate through some sort of seam, ideally a Web service interface. Once the team demonstrates this isolation, it’s back to good management and vigilance, instead of compliance per se, to keep the site up.

And keep the site up Zappos does.

Caching Improves Uptime, Response Time, Customer Experience

Chris Weiss, director for architecture, notes that Zappos does a massive amount of caching, both for pictures and for content, outside the firewall.

Using a service such as Akamai reduces data center demands and can serve images and content much closer to the customer’s connection, improving performance. Some traffic, such as common search results and common search requests for the auto-fill software, is routed through another cache.

Once traffic actually gets to Zappos, it is routed by the Zappos Flux Capacitor. (Yes, the ZFC is a Back to the Future homage.)

This is where Kris Ongbongan, a senior manager in operations, steps in. Back in 2009, when Zappos was purchased for 10 million shares of Amazon Common Stock (about $1.2 billion, according to my math), the site had a rejection rate as high as 1% of the pages that go through the caches, he explains. Today, that number is around 0.01%. That is not 99.99% uptime—it is 99.99% of requests that are special enough to make it through two layers of cache.

Beyond uptime, the company measures response time, with a service level expectation of 200 ms or less from first hit to the ZFC to going out of the firewall and back to the customer. The company also monitors the number of transactions per minute against historical averages, adjusted for volume, time of year, day and hour, sending alerts when those numbers cross a threshold based on those historical trends.

He walks me by his monitors for a quick overview:

Now that the company has spent the time and energy to build a reliable platform, the technical team is shifting its focus to enabling rapid deployment—getting teams out of the QA review meeting, eliminating handoffs and automating virtualization tasks.

(Re)structuring Projects for Fun and Profit

Releases at Zappos, at least on the backend group, are batched into minimal marketable features. A typical release might take two programmers two to four weeks, including a VM/merge/test deploy cycle.

Crystal Chang, the team lead for the core website group, explains the current QA process for the core website:

  • After code is complete for a feature, there is a quick, informal review, enforced by a workflow engine.
  • The code is handed to a release shepherd (who may have been the lead developer), who does programing sanity check test in a feature branch.
  • Once all the automated tests pass through Jenkins, the Continuous Integration Server, the shepherd will build a Virtual Machine, notify QA, and deliver release instructions to help the QA group with testing.
  • Once the back-and-forth of bug fixing is over, a release engineer pushes the code to a production state, deploys it and merges the code down to the master branch for the next feature branch.

A hands-on regression test of the website takes a week, perhaps a week and a half. A quick triage for a small change might take four days or less.

While this process has worked—the company made $1billion in sales the last year its numbers were publicly disclosed—it doesnt exactly make developers happy.

Zappos plans to address the issue. As Chang describes it, “When developers have to wait for a virtual machine, or for QA, they feel…blocked. The blocking makes them sad. We want to enable the programmers to create their own virtual machines, to have ‘kill switches’ so they can deploy code without turning it on, to streamline the deploy process. We consider this initiative, to enable more developers to get closer to production a strategic investment in the happiness of our employees.”

That bears repeating—”a strategic investment in the happiness of the employees.” That simple ethos—that happier employees are more productive employeesmdash;guides decisions from build to deploy to release.

Weiss asks, “What is worse? Having a bug deployed to production that you can roll back in half an hour, or having an onerous change control process, letting some bugs slip through (and they will), then having a heavy process take you six days to roll out a fix?”

Endgame: Zappos’ Philosophy Leaves Customers in Good Hands

By the end of day, I am exhausted. Rick Duggan, director of software development and my sponsor, tells me the ZCar will be here to pick me up at 5 p.m., but Ongbongan, ever jubilant, asks for an opportunity to give me a ride. I simply cannot turn down anyone who smiles this much, and we are off for the airport.

About that time, I start to get anxious. What about the QA department? What about the project managers, the PMO, the data center?

When am I going to come back?

It’s not Las Vegas. The glitter was nice, but I lost my shirt at blackjack.

No, there is something about the air here at Zappos. The energy.

But the car is at the Delta terminal, and it is, to paraphrase Lord Tennyson, up the hill, up the hill, over the brow and away. As for Ongbongan, he will go home, have dinner with the family, maybe check on the website after the kids are in bed.

I wait in the airport for a while and try the wireless network. “Sleep better at night” isn’t quite the right term, but somehow, after visiting Zappos, I feel better. After all, I know that this little corner of the Internet is in good hands.

Matthew Heusser is a consultant and writer based in West Michigan. You can follow Matt on Twitter @mheusser or email him. Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.