I tend to be pretty wary of large crowds at a party, doubly so of so-called “dance parties”. But in 2003, Google was still a shiny new world to me, and I felt obligated to immerse myself in as much of the culture as I could.
For the SEO – that’s Search Engine Optimization – business, the Google Dance was a time of fear and opportunity. You see, back in the aughts, computation was hard (bear with me kiddos). Computing PageRank, the essential “quality” of a web page that was central to is ranking in Google searches, required doing a massive, massive iterative graph computation on The Entire Worldwide Web. Which, even back then, was big.
Of course, PageRank was only one component in the mystical secret sauce formula that determined ranking (I had to work with the raw code once; that’s why I wear glasses now). But it was a big scary computation, which meant that even with Google’s legions of servers, rankings could only be updated intermittently, on the order of every month or so. And when the rankings were updated, SEOs who had staked their business models on being in the top seven results for “Cheap Airline Tickets” held their breath to see where they’d come out in the shuffle. Some sites went up in the rankings, some when down, and some – if the system thought they were spammy – disappeared altogether.
The Google Dance was a heady, scary time, even for those of us on the inside. You see, for every honest “white hat” SEO who played by the rules of making their pages Google-friendly, there were probably as many “black hat” SEOs who tried exploiting weaknesses in our algorithm to make their sites seem unreasonably relevant or high quality.
Matt Cutts was the head and heart of our web spam detection team, and he had a frighteningly keen eye for spotting these exploits and taking them down. Sweetest, kindest guy you could imagine if you played by the rules. But if you didn’t, he’d make your worst nightmare seems like a day of puppies and sunshine. We’ll get back to Matt in a bit here, but his constant lament was how fighting webspam inherently degraded quality. The web was so large and heterogenous that every time you adjusted the algorithm to take out a blatant spamming technique, it was difficult to avoid affecting at least a couple innocent sites.
It was a tough command to lead, and it was a never-ending battle. You’d discover that some site had found a new technique for keyword stuffing, and add a parameter for detecting and eliminating the cheat. But two months later they’d be back with a subtle twist using passive link farms. It was cat and mouse, with billions (that’s billions with a “b”) of dollars of online commerce – and Google’s reputation – at stake. So the Google Dance was indeed quite a day of reckoning every time it rolled around.
But “Google Dance” had another meaning as well. The Search Engine Strategies conference used to be held in San Jose each year, and it was there that all the SEOs, white hat and black hat alike, would gather to exchange, well, search engine strategies. Because it was in everyone’s best interests to have good relationships with these SEOs (or at least keep an eye on them), Google sponsored a party – a big dance party – at this conference, and invited folks to the campus to chat, drink, nibble on tasty food, and dance.
As I said, 2003 was the year of my first Google Dance, so I felt some obligation to go and see what the fuss was. There was the expected loud music, jam-packed crowd, fabulous food and brightly-colored props everywhere. But there was also something else: sure, these folks were here to party, but they were also here to try and kiss up to Google engineers and each other under the influence of alcohol to finagle any inside angle they could about Google’s ranking algorithm. It was like swimming with sharks. Smiling, preening, well-groomed sharks in polo shirts and khaki.
I did a pretty good job of only talking with other Googlers, and then only on innocent topics. But somewhere still early in the evening I noticed Matt ambling in the thick of it, cheerful and innocent as if on a walk through the park. Back then he was still mostly unknown to the general public; his one public function was occasionally serving as the anonymous online voice of engineering known only as “the Google Guy.” But the SEOs knew he was somehow critical to their success or failure, and a sort of ripple followed in his wake as he moved through the crowd.
I tried not to get too close – I didn’t want to interfere with his mojo – but I was curious what the interface between Matt and this teeming, writhing industry looked like. I didn’t have to wait long.
I have no idea what the guy’s name was. I remember it being something like Dmitry or Vlad, and remember him speaking in a Slavic or Russian accent. But that’s probably just because most of the Bond villains of my youth spoke in Slavic or Russian accents. And he definitely looked like a Bond villain.
Here’s my recollection of how the conversation went (note: I’m completely BS’ing the technical content):
“Matt! Matt Cutts! It is good to see you, my old friend.”
“Good to see you here too, Vlad. Are you enjoying yourself?” Matt seemed genuinely pleased, as if he were encountering a long-lost comrade, and not one of the banes of his existence.
“Oh boy, am I having a good time. Listen – you think you’re so smart…”
“You know how much it cost us to set up that link farm? And we only got what – two months out of it before you shut us down?”
“Ahyup – that one did give us a headache. You guys are good.” Matt was nodding – you could tell he’d been impressed.
“Indeed we are. And you’re never going to figure out what we’ve got going now.”
“Maybe we’ll get it soon enough. What are you doing? Domain variations? We’re onto those already, you know.”
“Ah – but not the way we’re doing them. You only detect variations by Hamming distance. I know; I’ve been probing the algorithm.”
“So you’re doing what? Common misspellings?”
“Even better: phonetic mappings. And we’ve got a shadow registrar, so it even looks like they’re owned by different sites.”
“Nice. Man, Vlad, you’ve put a lot of thought into this. I’ve got to say, you’re one of the best.”
“Damned right, Matt.”
And after Vlad wandered off, beaming with pride in his cleverness, Matt pulled a little notepad out of his pocket and scribbled a note or two on phonetic variations and shadow registrars. Then he slid back into the crowd, cheerful and innocent as before. Perhaps even more cheerful than before: another spam network was about to bite the dust.
(It feels kind of unreal to see Google these days jostling with Apple for the title of Most Valuable Company on the Planet. Seems like not that long ago it was this crazy little grad student project running off of borrowed machines in the CSL basement. The secret behind making that transition was an elegantly simple business model backed by fiendishly complex software. And the engineer behind that software was Ron Garret.
Ron sez: I dove into the adstoo project with as much enthusiasm as I could muster, which I’m ashamed to say wasn’t much. The situation was exacerbated by the fact that we had no Java development infrastructure. We were writing bareback, so to speak. We had no debugger. We were using JSP, but had no editor support for JSP syntax. (That turned into a real debugging nightmare. It could take many tens of minutes to find a single typographical error because the only indication that there was a problem was that the output just looked all wrong, but the actual problem could be very far away from the apparent source of the problem.)
Fortunately for me, I was assigned a junior engineer to work with/for me, and he actually knew what he was doing. While I struggled to learn the Java libraries and debugging techniques (I knew the basic language, but I had never done any serious development in it before) this guy just took the bull by the horns and pretty much just wrote the whole damn thing in a matter of weeks. I sometimes pull this old joke out of the dustbin, that in the ancient tradition of senior-junior relationships, he did all the work and I took all the credit.
That’s not quite true. I did end up writing the credit card billing and accounting system, which is a nontrivial thing to get right. Fortunately for me, just before coming to Google I had taken some time to study computer security and cryptography, so I was actually well prepared for that particular task. Back in those days internal security was more or less nonexistent. All the engineers had root access to all of the servers. But I believe in planning ahead, and I anticipated the day when Google was not going to be a small company any more, and so I designed the billing system to be secure against even a dishonest employee with root access (which is not such an easy thing to do). I have no idea if they are still using my system, but if they are then I’d feel pretty confident that my credit card number was not going to get stolen.
Things were made worse by the fact that I had been assigned an office mate who was also new to Google, and who was not part of the ads group. Most of the other ads group members were sharing offices (or cubicles) with other ads group members, and so I felt I wasn’t really part of the club. On top of that, I was away from home and didn’t really have a life up there in Northern California. The stress mounted. I started to get paranoid that I would get fired before reaching the one-year mark. I started experiencing stress-related health problems, some of which are still with me today. On more than one occasion I came that close to quitting. To this day I have no idea why I didn’t.
It was about this time that I had my one and only meeting with Larry Page. It was to discuss the progress of the adstoo project and to set a launch date. My manager was there along with a couple of other people (including Doug I think). Things went smoothly until Larry suggested changing the way billing was handled. I don’t remember the details, but my response was that this would be significant work. No one challenged me, but I found out later that the reaction of people in the room was something along the lines of, “Is he crazy? This ought to be a trivial change.” This little incident turned out to have very far ranging repurcussions later, but that will have to wait for the next blog entry.
Somehow we actually managed to launch AdWords on schedule, in September of 2000. It still seems like a bloody miracle. Most of the credit goes to Jeremy, Ed and Schwim. It could not have been done without them.
I can still remember watching the very first ad roll in. It was for a company called Lively Lobsters. Two months ago, after five years of intending to do so, I finally bought myself a little toy stuffed lobster to commemorate the occasion. (Update on 12/9/2005: It appears that Lively Lobsters has gone out of business. There’s some irony for you.)
About two weeks later all hell broke loose.
The AdWords launch went fairly smoothly, and I spent most of the next two weeks just monitoring the system, fixing miscellaneous bugs, and answering emails from users. (Yes, I was front-line AdWords support for the first month or so.)
The billing system that I had written ran as a cron job (for you non-programmers, that means that it ran automatically on a set schedule) and the output scrolled by in a window on my screen. Everything was working so well I didn’t really pay much attention to it any more, until out of the corner of my eye I noticed that something didn’t look quite right.
I pulled up the biller window and saw that a whole bunch of credit card charges were being declined one after another. The reason was immediately obvious: the amounts being charged were outrageous, tens of thousands, hundreds of thousands, millions of dollars. Basically random numbers, most of which no doubt exceeded people’s credit limits by orders of magnitude.
But a few didn’t. Some charges, for hundreds or thousands of dollars, were getting through. Either way it was bad. For the charges that weren’t getting through the biller was automatically shutting down the accounts, suspending all their ads, and sending out nasty emails telling people that their credit cards had been rejected.
I got a sick feeling in the pit of my stomach, killed the biller, and started trying to figure out what the fsck was going on. (For you non-programmers out there, that’s a little geek insider joke. Fsck is a unix command. It’s short for File System ChecK.)
It quickly became evident that the root cause of the problem was some database corruption. The ad servers which actually served up the the ads would keep track of how many times a particular ad had been served and periodically dump those counts into a database. The biller would then come along and periodically collect all those counts, roll them up into an invoice, and bill the credit cards. The database was filled with entries containing essentially random numbers. No one had a clue how they got there.
I began the process of manually going through the database to clean up the bad entries, roll back the erroneous transactions, and send out apologetic emails to all the people who had been affected. Fortunately, there weren’t a huge number of users back then, and I had caught the problem early enough that only a small number of them were affected. Still, it took several days to finally clean up the mess.
Now, it’s a complete no-brainer that when something like that happens you add some code to detect the problem if it ever happens again, especially when you don’t know why the problem happened in the first place. But I didn’t. It’s probably the single biggest professional mistake I’ve ever made. In my defense I can only say that I was under a lot of stress (more than I even realized at the time), but that’s no excuse. I dropped the ball. And it was just pure dumb luck that the consequences were not more severe. If the problem had waited a year to crop up instead of a couple of weeks, or if I hadn’t just happened to be there watching the biller window (both times!) when the problem cropped up Google could have had a serious public relations problem on its hands. As it happened, only a few dozen people were affected and we were able to undo the damage fairly easily.
You can probably guess what happened next. Yep. One week later. Same problem. This time I added a sanity check to the billing code and kicked myself black and blue for not thinking to do it earlier. At least the cleanup went a little faster this time because by now I had a lot of practice in what to do.
And we still didn’t know where the random numbers were coming from despite the fact that everyone on the ads team was trying to figure it out.
OK, time to wrap up this little soap opera.
The problem turned out to be something called a race condition, which is one of the most pernicious and difficult kinds of bugs to find. (Those of you who are technically savvy can skip to the end.)
Most modern server code is multi-threaded, which means that it does more than one computation at once. This is important because computers do more than just compute. They also store and retrieve information from hard disks, which are much, much slower than the computers. Every time the computer has to access the disk things come to a screeching halt. To give you some idea, most modern computers run at clock speed measured in gigahertz, or billions of cycles per second. The fastest hard disks have seek times (that is, the time it takes the drive to move the read/write head into the proper position) of several milliseconds. So a computer can perform tens of millions of computations in the time it takes a hard disk just to get into position to read or write data.
In order to keep things from bogging down, when one computation has to access the disk, it suspends itself, and another computation takes over. This way, one computer sort of “pretends” that it is really multiple computers all running at the same time, even though in reality what is happening is that one computer is just time-slicing lots of simultaneous computations.
The ad server, the machine that actually served up ads in response to search terms, ran multi-threaded code written in C++, which is more or less the industry standard nowadays for high-performance applications. C++ is byzantine, one of the most complex programming languages ever invented. I’ve been studying C++ off and on for ten years and I’m still far from being an expert. Its designers didn’t really set out to make it that complicated, it just sort of accreted more and more cruft over the years until it turned into this hulking behemoth.
C++ has a lot of features, but one feature that it lacks that Lisp and Java have is automatic memory management. Lisp and Java (and most other modern programming langauges) use a technique called garbage collection to automatically figure out when a piece of memory is no longer being used and put it back in the pool of available memory. In C++ you have to do this manually.
Memory management in multi-threaded applications is one of the biggest challenges C++ programmers face. It’s a nightmare. All kinds of techniques and protocols have been developed to help make the task easier, but none of them work very well. At the very least they all require a certain discipline on the part of the programmer that is very difficult to maintain. And for complex pieces of code that are being worked on by more than one person it is very, very hard to get it right.
What happened, it turned out, was this: the ad server kept a count of all the ads that it served, which it periodically wrote out to the database. (For those of you wondering what database we were using, it was MySQL, which leads to another story, but that will have to wait for another post.) It also had a feature where, if it was shut down for any reason, it would write out the final served ads count before it actually quit. The ad counts were stored in a block of memory that was stack allocated by one thread. The final ad counts were written out by code running in a different thread. So when the ad server was shut down, the first thread would exit and free up the memory holding the ad counts, which would then be reused by some other process, which would write essentially random data there. In the meantime, the thread writing out the final ad counts would still be reading that memory. This is why it’s called a race condition, because the two threads were racing each other, with the ad-count-writer trying to finish before the main thread freed up the memory it was using to get those counts. And because the ad-count-writer was writing those counts to a database, which is to say, to disk, it always lost the race.
Now, here is the supreme irony: remember the meeting with Larry where he wanted to make a change to the billing model that I said would be hard and everyone else in the room thought would be easy? The bug was introduced when the ad server code was changed to accommodate that new billing model. On top of that, this kind of bug is actually impossible to introduce except in a language with manual memory management like C++. In a language with automatic memory management like Java or Lisp the system automatically notices that the memory is still in use and prevent it from being reused until all threads were actually done with it.
By the time this bug was found and fixed (by Ed) I was a mental wreck, and well on my way to becoming a physical wreck as well. My relationship with my wife was beginning to strain. My manager and I were barely on speaking terms. And I was getting a crick in my neck from the chip I was carrying around on my shoulder from feeling that I had been vindicated in my assessment of the potential difficulties of changing the billing model.
So I went to my manager and offered to resign from the ads group. To my utter astonishment, she did not accept.
A lot of projects had stealth t-shirts, and my favorite of those is my Platypus. Unlike most Google shirts, a stealth project t-shirt is unbranded: nowhere does the universally-recognized logo appear, and whatever text or images do appear on the shirt are so intentionally cryptic that only those who are already in the know will recognize it, conveying a sense of secret society among those who wear one.
I think it was early 2005 that I first saw a Platypus shirt. It immediately struck me as weird; I mean, Valley culture embraces weird like a cheap drunk, but there’s “weird for weird’s sake,” and then there’s “weird because I know something you don’t.” And there was clearly way more to this than just weirdness sake. Being the t-shirt junkie and busybody that I am, I knew I had to have one of those shirts. The first one got away, but I’m sure I cornered the second Googler I saw wearing one and demanded enlightenment.
The company was small enough back then that the sense of shared culture was still pretty strong. Whoever it was I grabbed took a quick glance at my badge (“Yeah – you’re one of us”) gave me the high level bullet (“It’s an online storage project”) and told me I could look Platypus up on Moma.
The idea back then was pretty much what it still is today: a way to store and access your files online. But back in 2005 was still a couple of years before people started talking about “clouds”, and the idea of keeping all your data “out there” was kind of freaky and radical. But the Platypus team was trying to make it a reality.
The thing was, storing massive amounts of data “in the cloud” and allowing you to access it securely and fault-tolerantly from any machine, anywhere, was a freakin’ hard technical problem, and took a lot longer than anyone expected to get right. There were little slips and leaks and rumors, but at that time Google was the hottest thing on the web, and there were rumors about pretty much anything anyone could imagine it working on. I was doing a lot of campus recruiting around then and made a point of making sure my laptop’s desktop icons were visible on the projection screen before and after my presentation. I seeded it liberally with things like “Google Secret Mountain Fortress Driving Direction,” “Google Space Station Orbital Coordinates,” and “Google Phone Technical Specs” and no one, no one, no one ever called me on it.
Anyhow, back to what Google actually was working on: Platypus. As I said, it was pretty hard to get a system like that right, and when things went wrong, they did so in quirky, hard-to-reproduce ways. So the only way to really nail all the bugs was to get a lot of alpha users pounding on the system. But how are you going to convince alpha users to trust their precious files to your secure, fault-tolerant online file system if it’s not (yet) particularly secure or fault tolerant? Yup: you offer them t-shirts.
Even better, you don’t offer them a t-shirt for just using Platypus – it’s much to easy to sign up, throw a couple of unnecessary files in and forget about it. You want to encourage users to put your system through the wringer, to poke at its dark corners and expose its weaknesses, don’t you? So you only give out t-shirts to users who have used Platypus, encountered a bug and filed a bug report. And in doing so, you make the hottest t-shirt in Engineer even more of a badge of honor, bragging rights to the cognoscenti that yes, you waded in, slapped the system around and brought back a reportable bug for the leader boards to show for it.
I’m sure it’s still there in the eternal memory of the Bugalizer, but I’ve got no recollection of how I earned my shirt. All I recall is that moment of pride when I copied over the screenshot, typed my name into the “reported by” field and hit submit. And I also don’t remember the aftermath, but I’m not to proud to speculate that, once the coveted shirt was in my greedy little hands, I probably didn’t touch Platypus again until it launched two years later.
It’s hard to overstate the adrenaline of the launch. We’ve been on this project for a bit over a year – nothing major by any standards. And what we’re launching isn’t a major product. Technically speaking, it isn’t even a product. Prosaically, we’ve just changed the presentation style of the Labs homepage, a site Google’s been serving for years. We are adding to it links to a couple of other “Labs” products that are getting unveiled today, but they’ve got their own teams and product managers. We’re not getting ourselves stressed on their behalf; we’ve got enough on our plate.
The actual sequence of steps is pretty straightforward – I’ve scrawled them up on the whiteboard in the unoccupied cubes our team is camped out at this morning:
10:30: flip switch to make new apps externally visible
10:30+e: verify new apps externally visible
10:30+2e: flip switch to make new site externally visible at googlelabs.com
10:30+3e: panic and debug
12:30: blog post goes out
12:30+e: flip redirect to make old site (labs.google.com) redirect to new site (googlelabs.com)
Monday morning, 9:00. Arthur and I are halfway up to the city, carpooling through the tail end of rush hour. We’re calculating transit times – 15 more minutes to the San Francisco office, with 10 minutes to park, puts us into action at 9:25. We’ll have 90 minutes to spare – plenty of time to put together an impromptu war room, sync with the infrastructure guys who’ll be flipping switches for us, and run the pre-launch tests one last time. Hell, there’s even time to get breakfast and a cup of coffee.
That, of course, is when my phone rings. I fumble for the right buttons to pick it up on the car’s hands-free audio.
“Hi Pablo? It’s Michael. Do you have a moment? It’s Important.”
I hate when It’s Important. Because “It” is never Important Good, like Larry and Sergey wanting to lend us the jet for the weekend, and needing to know where we want the limo pickup. No, “It’s Important” always means It’s Important Bad, like the datacenter we’re hosted out of has just gone down in flames.
Sure enough, it’s the datacenter. Down for the count, and something’s not working right on the automatic failover to a backup. All of a sudden, ninety minutes doesn’t seem like such a lot of time.
Michael is our product manager, and he fills us in. It may not be that bad. He’s already up in SF, coordinating a workaround (have I mentioned how much I love this guy?). The infrastructure guys are manually porting our code to another site and configuring a dedicated pool of machines to host us. They just need to know what special runtime flags, if any, we need for the pool.
Arthur and I parley briefly, trying to remember what special favors we’d asked when we got our machines set up the first time. We come up blank, which is either a good thing (we didn’t ask for any) or a bad thing (we’ve forgotten the arcane animal sacrifices our code requires). In five minutes we’ve crossed from nervous confidence to outright panic, then over into the eerie, liminal world of “Hope for the Best.” And we’re still 20 minutes from the office.
It’s good to have Arthur in the right seat here. He’s a couple of years younger than me and too modest to wear the hard-earned brass rat, but quietly carries a decade more experience than most of the engineers at this company. In previous fire drills, I’ve often imagined his calm get-it-done perspective as having been channelled from the days of Apollo’s mission control. Alright boys, they’re on fire out there halfway to the moon bleeding hydrazine and short on oxygen – whaddaya got for me?
Plan B is on deck, so we review the options for Plan C. By the time we make it to a desk, Michael will have confirmed that the manual port is working. Or not. If not, one of us can join the infrastructure guys if figuring out why the hell it isn’t, while the other can try copying our data over the internal, test-only version of the console. It can’t go live, but we’ll at least be able to do the demo, and we’ll just have to get RJ to massage the messaging: instead of “We’ve just launched…” he’ll get to announce that “we are about to launch…” Of course, there’s the rest of the PR freight train we’ll have to deal with. The blog, the press release, the “googlegrams”. Everything’s choreographed to trip at 12:30, alerting the press – and the world at large – to go have a look at http://www.googlelabs.com.
A haunting story looms in my mind. Somewhere in the Soviet Union during the peak of the Cold War space race. The rocket launch was aborted just before launch. Countdown clock stopped, and ignition sequence shut down as the crew went in to diagnose a problem with the first stage motor. The scientists, engineers, generals and VIPs gathered at the base of the monstrous rocket. But the rocket’s fourth stage motor relied on a timed ignition. It was to fire once the third stage had burnt out, at some precise number of minutes after T-0. Somehow, when the countdown clock was stopped, no one ever told the fourth stage, and when the designated time came it dutifully burst to life, straight into the thousand ton stack of explosive fuel in the stages below it. Some hold that the Soviet space program never recovered from the disaster. The carnage and loss of life was horrific, and though carefully shrouded from the public, the Russian scientists knew, and the effect on morale was devastating.
I decide to spare Arthur my gruesome vision, but to me the moral was that any go/no-go decision had to be an all-in proposition.
Parking turns out to be a non-trivial affair – a couple of times around the block before we find an underground garage of twisty turn passages, all alike. But by the time we’re riding the elevator up to the fourth floor (9:25, on the dot), we still haven’t heard from Michael. We take this as a good sign, mostly because there’s not a lot we can do right now if it isn’t.
The thing about Michael – the thing about all the best product managers at Google – is that he really cares. I don’t just mean that he cares about the product; that’s given. But he cares about everything, and everyone. As software engineer-turned Harvard MBA-turned product manager, he’s spent sweat on both sides of the divide. His boyish charm and easy smile are effective tools, but they’re earnest. He really wants to know what the UI folks think is best. He really wants the PR team to have their say. You get the feeling that he really truly likes everyone, and because of that, everyone seems willing to go just one step further to keep him happy.
Arthur and I find the corner where the infrastructure team sits, and corner Pete, their PM, for an update. He offers that the backup servers “seem to be holding”, which Arthur confirms with a few test queries. Our adrenaline settles down a notch as we scout a pair of empty desks around the corner and start to play Marco Polo on the phone with Michael. Arthur, the covert Zen monk, reminds me: time to start breathing again, eh?
The only member of the team still missing is Artem, our Russian. He’s young, he’s fast and he’s fearless. Sometimes he scares me. Started coding professionally at age 13, and as far as I can tell, hasn’t stopped for a breath since. I sometimes imagine him as James Bond’s technological, Russian counterpart. Or maybe McGyver’s. “We’ve got latency problems? I think it’s in the distributed memcache; lemme build an intermediate hot-caching layer that fronts it on a machine-by-machine basis. Yes, I know it’ll require recoding the storage layer representation. Yes, I know we’re launching on Monday. Hey, we’ve got an entire weekend ahead – we could write half of Vista in that time. Yes, it would probably be the bad half, but – look, just let me do it, okay? I’ll send you the code reviews.”
And somehow, it works. Time and again, I’ve seen his code go into security review and come out with no comment other than the proverbial gold star: LGTM (“looks good to me.”). If anything, the “additional comments” section will say something like “Very nicely done!”
So, Artem’s won everyone’s trust. The downside of this trust is that, when a grenade comes tumbling into the office, we’ve gotten into the habit of looking to him to throw himself on it (and defuse it, and turn it into an undocumented feature, with plenty of time to spare). So without Artem on hand, we’re missing our safety – we’re test pilots without a parachute.
Thirty minutes to go, and Artem finds us. He cruises in like it’s Friday morning, and his calm is almost unnerving – “Anyone know where the espresso machine is in this office? All I can find is the pre-made stuff.”
Michael’s briefed him by phone already. He picks a corner near the whiteboard, settles in, and flips open his laptop as if to catch up on email. I look over to Arthur, who reminds me (again) to keep breathing. Artem looks up and reads my mind: “Look, the code is up on the backup datacenter, we’ve run the tests, and everything looks good at this point. There’s nothing else we can do until it’s time to flip the switch. You guys eaten breakfast yet?”
I find myself wondering if I actually like to panic, and stifle the thought as soon as it surfaces. Artem is right. I breathe – a couple of times for good measure – then return to my temporary seat and try to focus on my backlog of email. None of it really needs to be answered today, but slogging through some of the bitwork and meeting requests is as good as anything for making the time pass, and is certainly more productive than hyperventilating.
Inexplicably, the next time I look at my watch, it’s 10:27. Holy crap – show time. Give or take an hour. Honestly, I need to keep reminding myself that the only real deadline is two hours away, at 12:30, when RJ starts showing thing off live to the assembled press. But Radhika and Andy, tech leads for the new apps we’re launching, give us the thumbs up, so there’s no reason not to launch. We poke our heads around the corner to where Pete and the AppEngine team are still dealing with the smoking aftermath of the datacenter crash. Pete rolls over to his monitor, flips a few virtual switches in DNS-land from his keyboard and we run back to our laptops to check. It’s all good so far.
10:30 – or something like it. One last round of checks with Michael, Arthur and Artem before flagging Pete again, who – almost anticlimactically – fires off another keyboard command for us. “There – you guys should be live now.”
Internal access? Check. External? Nothing. 404 – what the hell? Clear the DNS cache and refresh. 404 again. For real. The site’s just not there. Somewhere in my brainstem, that reptilian ball of Rambo neurons controlling our basic fight-or-flight reflexes kicks the door down and says “Oh yeah – that’s what I’m talkin’ about!”
I’m back at Pete’s office so fast I swear I can still see my shadow back at the cubes. “You’re sure you flipped the switch to take us live?” He looks back at me as though I’d asked him whether he was sure he was Pete. “Yeah – absolutely.” And I believe him, because he does this stuff for a living, and we’re just one more app out of thousands, absolutely routine stuff.
I relay confirmation back to the cubes, where Arthur and Michael are conferring. Even Artem looks worried now, which scares me more than anything else.
So what next? Packet traces? We’ve got no idea how the hell to do something like that without help from the App Engine team, and they’ve got their hands full with more important stuff at the moment. I stifle the urge to ask Pete whether he’s sure that he’s sure, and remind myself, yet again, to do some of that “breathing” stuff. Arthur’s right (as he – maddeningly – always is), it helps me focus again.
We’re all at our screens now. Artem’s trying packets, Arthur’s looking at the logs, and I’ve got my head in the code. I don’t have much hope that I’ll be much use, but scrolling through the sequence of lines fired when a request arrives gives me something to do. It’s sort of a rosary, I guess, for coders: receive request, parse headers, dispatch database query… Artem calls out from across the cube: “Requests are making it to the server – why the hell are we getting 404’s?” He’s not asking anyone in particular – or maybe he’s asking himself. Arthur’s unflappable, as always, his voice as flat as that of Spock, acknowledging the impossible: “Confirmed – we’re logging the requests.”
I circle back through the request processing code to where the logging statements are. “What’s the log message?”
“Just that the GET was received.”
“Anything else after that?
I loop back through the code, but it’s now scrolling past me, Matrix-like, in a cascade of indecipherable symbols. I have a vague sense that I’m panicking, and breathing doesn’t help.
Arthur looks up, joins me on the code and takes less than a minute to find it: we – no, I – had left a failsafe in place. Just in case the server were accidentally exposed to the outside, it did its own check where the request was coming from. If it looked like an external IP address, the server pretended it wasn’t there, faking an “address not found” response.
I flip the failsafe flag, reboot, and hit the server again. The shiny new “Google Labs” home page fills my screen. “We have liftoff!” – I kind of shriek it.
“It’s live?” – Arthur’s voice is measured.
Artem’s not waiting for confirmation: “I’ve got it here, too. Woo hoooo!” I think he starts doing some kind of dance, but I’m not paying attention anymore. I click through a couple of the links, confirming that everything’s live.
Something else kicks into our collective bloodstreams – endorphins? – blending with the adrenaline, transforming the panic of 15 seconds ago into a Hell-yeah-bring-it-on machismo. We’re pumped now, ready to face the world: we have launched.
Mind you, at this point, there are exactly three people in the world who know that Google Labs has launched. Five, if you count Pete and Radhika, but they’ve got other things to worry about. Until someone stumbles across the improbable URL http://www.googlelabs.com, we’re invisible.
The press conference is still ahead, with a dozen plus reporters from Tech Crunch, the New York Times, Wired and the like breathlessly typing away, “I’m here live-blogging the much anticipated Google Labs launch…” while RJ eloquently explaining how Google understands that “innovation is a conversation, not a one-way street.” And when it does come, Arthur, Artem, Michael and I are at the back of the conference room, as invisible and as much a part of the furniture as the stackable flexichairs we’re sitting in.
Artem’s got a window up on Google News, hitting “refresh” as the updates come fast and furious. RJ is in the groove – he’s got the press eating from his hand, and we know – we just know – that tomorrow’s headlines are going to be glorious (“Google News Timeline: A Glorious, Intriguing Time Sink”). I do find myself wishing that RJ will, at some point, direct his audience’s attention back to where we’re sitting, wishing that he’ll make passing reference of thanks to the engineers who worked, mostly in their spare time, for over a year to conceive of and bring to fruition this new way to let engineers launch products and bring users into the process.
I find myself wishing there was a way for him to tell the story of how improbably it all came together, starting with a Friday night email appeal blitzed out to eng-misc almost two years ago, asking for help and promising a “Google Labs” t-shirt for anyone who helped. Which reminds me: I’ve got to get some t-shirts printed up…