20 April, 2009
It’s hard to overstate the adrenaline of the launch. We’ve been on this project for a bit over a year – nothing major by any standards. And what we’re launching isn’t a major product. Technically speaking, it isn’t even a product. Prosaically, we’ve just changed the presentation style of the Labs homepage, a site Google’s been serving for years. We are adding to it links to a couple of other “Labs” products that are getting unveiled today, but they’ve got their own teams and product managers. We’re not getting ourselves stressed on their behalf; we’ve got enough on our plate.
The actual sequence of steps is pretty straightforward – I’ve scrawled them up on the whiteboard in the unoccupied cubes our team is camped out at this morning:
- 10:30: flip switch to make new apps externally visible
- 10:30+e: verify new apps externally visible
- 10:30+2e: flip switch to make new site externally visible at googlelabs.com
- 10:30+3e: panic and debug
- 12:30: blog post goes out
- 12:30+e: flip redirect to make old site (labs.google.com) redirect to new site (googlelabs.com)
Monday morning, 9:00. Arthur and I are halfway up to the city, carpooling through the tail end of rush hour. We’re calculating transit times – 15 more minutes to the San Francisco office, with 10 minutes to park, puts us into action at 9:25. We’ll have 90 minutes to spare – plenty of time to put together an impromptu war room, sync with the infrastructure guys who’ll be flipping switches for us, and run the pre-launch tests one last time. Hell, there’s even time to get breakfast and a cup of coffee.
That, of course, is when my phone rings. I fumble for the right buttons to pick it up on the car’s hands-free audio.
“Hi Pablo? It’s Michael. Do you have a moment? It’s Important.”
I hate when It’s Important. Because “It” is never Important Good, like Larry and Sergey wanting to lend us the jet for the weekend, and needing to know where we want the limo pickup. No, “It’s Important” always means It’s Important Bad, like the datacenter we’re hosted out of has just gone down in flames.
Sure enough, it’s the datacenter. Down for the count, and something’s not working right on the automatic failover to a backup. All of a sudden, ninety minutes doesn’t seem like such a lot of time.
Michael is our product manager, and he fills us in. It may not be that bad. He’s already up in SF, coordinating a workaround (have I mentioned how much I love this guy?). The infrastructure guys are manually porting our code to another site and configuring a dedicated pool of machines to host us. They just need to know what special runtime flags, if any, we need for the pool.
Arthur and I parley briefly, trying to remember what special favors we’d asked when we got our machines set up the first time. We come up blank, which is either a good thing (we didn’t ask for any) or a bad thing (we’ve forgotten the arcane animal sacrifices our code requires). In five minutes we’ve crossed from nervous confidence to outright panic, then over into the eerie, liminal world of “Hope for the Best.” And we’re still 20 minutes from the office.
It’s good to have Arthur in the right seat here. He’s a couple of years younger than me and too modest to wear the hard-earned brass rat, but quietly carries a decade more experience than most of the engineers at this company. In previous fire drills, I’ve often imagined his calm get-it-done perspective as having been channelled from the days of Apollo’s mission control. Alright boys, they’re on fire out there halfway to the moon bleeding hydrazine and short on oxygen – whaddaya got for me?
Plan B is on deck, so we review the options for Plan C. By the time we make it to a desk, Michael will have confirmed that the manual port is working. Or not. If not, one of us can join the infrastructure guys if figuring out why the hell it isn’t, while the other can try copying our data over the internal, test-only version of the console. It can’t go live, but we’ll at least be able to do the demo, and we’ll just have to get RJ to massage the messaging: instead of “We’ve just launched…” he’ll get to announce that “we are about to launch…” Of course, there’s the rest of the PR freight train we’ll have to deal with. The blog, the press release, the “googlegrams”. Everything’s choreographed to trip at 12:30, alerting the press – and the world at large – to go have a look at http://www.googlelabs.com.
A haunting story looms in my mind. Somewhere in the Soviet Union during the peak of the Cold War space race. The rocket launch was aborted just before launch. Countdown clock stopped, and ignition sequence shut down as the crew went in to diagnose a problem with the first stage motor. The scientists, engineers, generals and VIPs gathered at the base of the monstrous rocket. But the rocket’s fourth stage motor relied on a timed ignition. It was to fire once the third stage had burnt out, at some precise number of minutes after T-0. Somehow, when the countdown clock was stopped, no one ever told the fourth stage, and when the designated time came it dutifully burst to life, straight into the thousand ton stack of explosive fuel in the stages below it. Some hold that the Soviet space program never recovered from the disaster. The carnage and loss of life was horrific, and though carefully shrouded from the public, the Russian scientists knew, and the effect on morale was devastating.
I decide to spare Arthur my gruesome vision, but to me the moral was that any go/no-go decision had to be an all-in proposition.
Parking turns out to be a non-trivial affair – a couple of times around the block before we find an underground garage of twisty turn passages, all alike. But by the time we’re riding the elevator up to the fourth floor (9:25, on the dot), we still haven’t heard from Michael. We take this as a good sign, mostly because there’s not a lot we can do right now if it isn’t.
The thing about Michael – the thing about all the best product managers at Google – is that he really cares. I don’t just mean that he cares about the product; that’s given. But he cares about everything, and everyone. As software engineer-turned Harvard MBA-turned product manager, he’s spent sweat on both sides of the divide. His boyish charm and easy smile are effective tools, but they’re earnest. He really wants to know what the UI folks think is best. He really wants the PR team to have their say. You get the feeling that he really truly likes everyone, and because of that, everyone seems willing to go just one step further to keep him happy.
Arthur and I find the corner where the infrastructure team sits, and corner Pete, their PM, for an update. He offers that the backup servers “seem to be holding”, which Arthur confirms with a few test queries. Our adrenaline settles down a notch as we scout a pair of empty desks around the corner and start to play Marco Polo on the phone with Michael. Arthur, the covert Zen monk, reminds me: time to start breathing again, eh?
The only member of the team still missing is Artem, our Russian. He’s young, he’s fast and he’s fearless. Sometimes he scares me. Started coding professionally at age 13, and as far as I can tell, hasn’t stopped for a breath since. I sometimes imagine him as James Bond’s technological, Russian counterpart. Or maybe McGyver’s. “We’ve got latency problems? I think it’s in the distributed memcache; lemme build an intermediate hot-caching layer that fronts it on a machine-by-machine basis. Yes, I know it’ll require recoding the storage layer representation. Yes, I know we’re launching on Monday. Hey, we’ve got an entire weekend ahead – we could write half of Vista in that time. Yes, it would probably be the bad half, but – look, just let me do it, okay? I’ll send you the code reviews.”
And somehow, it works. Time and again, I’ve seen his code go into security review and come out with no comment other than the proverbial gold star: LGTM (“looks good to me.”). If anything, the “additional comments” section will say something like “Very nicely done!”
So, Artem’s won everyone’s trust. The downside of this trust is that, when a grenade comes tumbling into the office, we’ve gotten into the habit of looking to him to throw himself on it (and defuse it, and turn it into an undocumented feature, with plenty of time to spare). So without Artem on hand, we’re missing our safety – we’re test pilots without a parachute.
Thirty minutes to go, and Artem finds us. He cruises in like it’s Friday morning, and his calm is almost unnerving – “Anyone know where the espresso machine is in this office? All I can find is the pre-made stuff.”
Michael’s briefed him by phone already. He picks a corner near the whiteboard, settles in, and flips open his laptop as if to catch up on email. I look over to Arthur, who reminds me (again) to keep breathing. Artem looks up and reads my mind: “Look, the code is up on the backup datacenter, we’ve run the tests, and everything looks good at this point. There’s nothing else we can do until it’s time to flip the switch. You guys eaten breakfast yet?”
I find myself wondering if I actually like to panic, and stifle the thought as soon as it surfaces. Artem is right. I breathe – a couple of times for good measure – then return to my temporary seat and try to focus on my backlog of email. None of it really needs to be answered today, but slogging through some of the bitwork and meeting requests is as good as anything for making the time pass, and is certainly more productive than hyperventilating.
Inexplicably, the next time I look at my watch, it’s 10:27. Holy crap – show time. Give or take an hour. Honestly, I need to keep reminding myself that the only real deadline is two hours away, at 12:30, when RJ starts showing thing off live to the assembled press. But Radhika and Andy, tech leads for the new apps we’re launching, give us the thumbs up, so there’s no reason not to launch. We poke our heads around the corner to where Pete and the AppEngine team are still dealing with the smoking aftermath of the datacenter crash. Pete rolls over to his monitor, flips a few virtual switches in DNS-land from his keyboard and we run back to our laptops to check. It’s all good so far.
10:30 – or something like it. One last round of checks with Michael, Arthur and Artem before flagging Pete again, who – almost anticlimactically – fires off another keyboard command for us. “There – you guys should be live now.”
Internal access? Check. External? Nothing. 404 – what the hell? Clear the DNS cache and refresh. 404 again. For real. The site’s just not there. Somewhere in my brainstem, that reptilian ball of Rambo neurons controlling our basic fight-or-flight reflexes kicks the door down and says “Oh yeah – that’s what I’m talkin’ about!”
I’m back at Pete’s office so fast I swear I can still see my shadow back at the cubes. “You’re sure you flipped the switch to take us live?” He looks back at me as though I’d asked him whether he was sure he was Pete. “Yeah – absolutely.” And I believe him, because he does this stuff for a living, and we’re just one more app out of thousands, absolutely routine stuff.
I relay confirmation back to the cubes, where Arthur and Michael are conferring. Even Artem looks worried now, which scares me more than anything else.
So what next? Packet traces? We’ve got no idea how the hell to do something like that without help from the App Engine team, and they’ve got their hands full with more important stuff at the moment. I stifle the urge to ask Pete whether he’s sure that he’s sure, and remind myself, yet again, to do some of that “breathing” stuff. Arthur’s right (as he – maddeningly – always is), it helps me focus again.
We’re all at our screens now. Artem’s trying packets, Arthur’s looking at the logs, and I’ve got my head in the code. I don’t have much hope that I’ll be much use, but scrolling through the sequence of lines fired when a request arrives gives me something to do. It’s sort of a rosary, I guess, for coders: receive request, parse headers, dispatch database query… Artem calls out from across the cube: “Requests are making it to the server – why the hell are we getting 404’s?” He’s not asking anyone in particular – or maybe he’s asking himself. Arthur’s unflappable, as always, his voice as flat as that of Spock, acknowledging the impossible: “Confirmed – we’re logging the requests.”
I circle back through the request processing code to where the logging statements are. “What’s the log message?”
“Just that the GET was received.”
“Anything else after that?
I loop back through the code, but it’s now scrolling past me, Matrix-like, in a cascade of indecipherable symbols. I have a vague sense that I’m panicking, and breathing doesn’t help.
Arthur looks up, joins me on the code and takes less than a minute to find it: we – no, I – had left a failsafe in place. Just in case the server were accidentally exposed to the outside, it did its own check where the request was coming from. If it looked like an external IP address, the server pretended it wasn’t there, faking an “address not found” response.
I flip the failsafe flag, reboot, and hit the server again. The shiny new “Google Labs” home page fills my screen. “We have liftoff!” – I kind of shriek it.
“It’s live?” – Arthur’s voice is measured.
Artem’s not waiting for confirmation: “I’ve got it here, too. Woo hoooo!” I think he starts doing some kind of dance, but I’m not paying attention anymore. I click through a couple of the links, confirming that everything’s live.
Something else kicks into our collective bloodstreams – endorphins? – blending with the adrenaline, transforming the panic of 15 seconds ago into a Hell-yeah-bring-it-on machismo. We’re pumped now, ready to face the world: we have launched.
Mind you, at this point, there are exactly three people in the world who know that Google Labs has launched. Five, if you count Pete and Radhika, but they’ve got other things to worry about. Until someone stumbles across the improbable URL http://www.googlelabs.com, we’re invisible.
The press conference is still ahead, with a dozen plus reporters from Tech Crunch, the New York Times, Wired and the like breathlessly typing away, “I’m here live-blogging the much anticipated Google Labs launch…” while RJ eloquently explaining how Google understands that “innovation is a conversation, not a one-way street.” And when it does come, Arthur, Artem, Michael and I are at the back of the conference room, as invisible and as much a part of the furniture as the stackable flexichairs we’re sitting in.
Artem’s got a window up on Google News, hitting “refresh” as the updates come fast and furious. RJ is in the groove – he’s got the press eating from his hand, and we know – we just know – that tomorrow’s headlines are going to be glorious (“Google News Timeline: A Glorious, Intriguing Time Sink”). I do find myself wishing that RJ will, at some point, direct his audience’s attention back to where we’re sitting, wishing that he’ll make passing reference of thanks to the engineers who worked, mostly in their spare time, for over a year to conceive of and bring to fruition this new way to let engineers launch products and bring users into the process.
I find myself wishing there was a way for him to tell the story of how improbably it all came together, starting with a Friday night email appeal blitzed out to eng-misc almost two years ago, asking for help and promising a “Google Labs” t-shirt for anyone who helped. Which reminds me: I’ve got to get some t-shirts printed up…