May 18, 2008

Lessons from running the Live Mesh services, 4 weeks in

It's been almost 4 weeks since we went live, and so far things have gone pretty well -- we haven't had any system meltdowns, we haven't lost anybody's data, and the general reaction to our feature set has been quite positive.

It's also been really interesting watching the system "breathe", so to speak -- looking at various parameters of system load and their evolution over time, identifying recurring patterns, figuring out the events that led to irregular patterns in the performance data etc. In particular, I now have personal experience with a number of things that I only knew from reading about them. None of this is profound in any way, but here are some examples:

1. There's a clear day-vrs-night and weekday-vrs-weekend usage pattern: the graph below shows a week's worth of data of a statistic that basically measures how many clients are connected to our cloud services. The troughs in the data occur at night, the peak is around noon PDT, and the two low peaks in a row at the beginning represent weekend days.

Socpattern







2. Beware of synchronized clients: one of the problems that cloud services have to deal with is the "flash crowd" effect. For example, if an IM service crashes and disconnects all the millions of clients currently connected to it, you don't want all the clients to try reconnecting at the same time, or the incoming load can bring your servers to their knees [if you haven't built in mechanisms for dealing with overload gracefully]. Instead, you want clients to spread out their reconnects over a period of time, to smoothen out the load.

We have a similar problem to an IM service in that if a certain subset of our in-memory state management services crash, or are brought down, we'll cause client disconnects and reconnects. Well, of course we had to bounce exactly that set of services in the first couple of weeks, and so we ended up disconnecting our clients, they all tried to reconnect at the same time, and then they stayed in sync -- the row of spikes shows a system load metric that illustrates the regular cadence that comes from the synchronization.

Tracemsgs_2 



In the meantime, we've added the necessary code to avoid that problem in the future :-)

3. In a large-enough system, something is always broken: we have a monitoring tool that periodically looks at all our machines and tries to figure out whether they're healthy, by looking for service crashes, strange load characteristics, the wrong version of software running etc. Looking at this data over the last few weeks, I've realized that even in a system with "only" a couple of hundred servers, there's pretty much always something that's not quite right -- there are machines with failing hard disks, some machines appear to be handling a disproportionate share of the load, some are running the wrong bits, on others our service health checks are failing etc. The second realization was that the system state is very dynamic -- service health checks will start passing again, the load gets rebalanced etc, and so what you really want to do is watch for persistent errors, and not just jump on everything that seems wrong at a particular point in time. And, perhaps more importantly, our system is robust enough to deal with things not being quite right.

4. Invest early in "boring management infrastructure": we generate tons of logs; currently, we're producing on the order of 100GB a day, and that'll only increase as we increase the number of users of our system. These logs are really our only source of debugging data when users complain about cloud interactions going wrong, so it's clearly very important to have the necessary infrastructure to collect and process all this data. Thankfully, we actually built some of the necessary tools before releasing our bits, and so now have a tolerable way of filtering through this flood of data. That said, some of the tools we have are already straining to keep up, so this is something we'll have to keep working on.

Technorati tag: LiveMesh.

May 11, 2008

This just in: "omg, work is totally hard and stuff"

If you need "time off" at 23, you have a long road ahead of you.

April 30, 2008

Official pontificating on Live Mesh

I have a post up on the datacenter architecture of the services behind Live Mesh, over at our official blog.

Technorati tag: LiveMesh

April 28, 2008

What is the sound of one Mesh clapping ?

While reading some of the coverage about Live Mesh, I came across this:

"Can Mesh support Twitter streams orchestrated by identity mapping via affinities and abstracted to devices across OS, mobile, and corporate divides via Silverlight?"

Try as I might, I really have no idea what that question means. I think it might be a koan. Now I just have to wait for the moment of enlightenment generated by pondering it.

April 22, 2008

We're live !

Live Mesh, the product I've been working on for the last 2 years is live ! I'm sitting in a conference room at work switching between watching the performance of our systems, reading the news and blog posts that have sprung into existence in the last 20 minutes ie since it was officially unveiled at 9pm PDT, and watching Mesh-related tweets [via Tweetscan]. It's pretty damn cool.

Some links for your convenience:

- The official blog post
- TechCrunch coverage
- CNet coverage
- Some more official coverage, including overview videos etc

... and so on.

Update: my favorite phrase in the coverage so far comes from the NYT article: "Live Mesh’s logo is a Tolkienesque graphical ring...". Yeeesss, Live Mesh is our preciousss ...

April 14, 2008

"I will shatter your feeble sports records with my invincible Iron Leg Technique !"

Almost exactly 3 years ago, I wrote about a family setting records in competitive taekwondo, namely the Lopez siblings from Texas. Well, they continue to set records: three of them are on the taekwondo team for the 2008 Olympics, and they'll be coached by their brother. So all 4 siblings are going to the Olympics.

Oh, and Steven Lopez *still* hasn't lost a match.

I mean, damn.

April 10, 2008

Drug dealers abhor a vacuum

We had a couple of brief, shining, glorious days of hope when we heard last weekend that the local version of Nino Brown had finally been arrested and evicted. The sun came out, birds were singing and no shady characters were parking next to our house for 10 minutes, dumping a year's worth of fast-food and malt liquor containers on the street, and then driving away with their drug of choice.

Alas, our joy was short-lived. Somebody else has taken up the niche in the ecosystem that was occupied by the old dealer, except that the new one apparently operates in a more mobile fashion: instead of dealing out of a house, he walks around the neighborhood all day, handing out "candy". In retrospective marketing-speak, I suppose it was naive to consider that an area with such high brand-name recognition and concentration of consumers would not immediately draw a willing supplier. 

I suppose the upside is that we should still see a net reduction in undesirables on our street because now they have to find a moving target.

[And the nerd in me wonders whether, if you assume that people on average drive 20-30mph, the average block in our area is about 50 yards etc, there's some optimum speed and route that he can walk in order to minimize the amount of time his customers need to find him.]

March 28, 2008

My plan for becoming an Internet billionaire

I have come up with a Web 2.0 company that I believe will catapult me into the ranks of fabled technology entrepreneurs like Bill Gates, Steve Jobs, Brin and Page etc.

Now, the first thing you have to figure out when starting one of these companies is what you're going to call it. Ideally, it'll be a hip, edgy, short name, like a word that's been written by somebody used to automatic spellchecking who has suddenly been deprived of that crutch. Flickr, Tumblr, Crp, that sort of thing. Alternatively, you can combine two [or more] words that contain fragments like "ix", "ytics" etc, and suggest, in some vague but oddly convincing way, that your company does Really Complicated Stuff That Requires Advanced Technology (TM).

In that spirit, I present to you: Bananalytics !

Bananalytics will cater to an audience that wants, nay, needs, deep insight and access to the bananasphere: the global, constantly-evolving and growing network of banana-related information. That's a huge, heretofore-underserved audience, and includes key demographics like ... monkeys. And baby-food makers.

Bananalytics will provide multiple lenses on the bananasphere, via a specialized search engine, a real-time high-density data feed, and a user-generated list of the most relevant banana-related content.

We will generate a rich, searchable index of all banana-related content on the Internet by sending out web monkeys [a proprietary version of the more conventional web crawler], that will swing from hyperlink to hyperlink, returning with bunches of delicious banana-focused pages to index.

Bananalytics will also work with major suppliers to embed sensors in each banana that will detect when a banana has been eaten. This will allow us to generate a real-time data feed that will piped to "lifestreaming" services like Twitter, Jaiku and the Facebook newsfeed: "Alex has just eaten a banana."

No Web 2.0 company is complete without user-generated content. Our website will contain a constantly-updated stream of user-generated banana-related news. To make it easy for users to submit stories, we will supply a web gadget that content providers can embed on their banana-related news stories, blog posts etc. Readers will be able to click on this "Bananalyze this !" widget and thereby cause the tagged content to be submitted to our proprietary Bananalgorithm ranking engine, which will choose the stories displayed on our homepage.

Since web users are now accustomed to getting everything for free, Bananalytics' main source of revenue is expected to be ads. We expect that advertisers like Dole, Chiquita, and the makers of non-slip flooring ["Never slip on a banana peel again !"], will realize the unique access to banana-minded people provided by our service, and be willing to shell out oodles of cash.

The domain bananalytics.com is unfortunately already taken, but I'm sure I can buy it for a few Bananalytics stock options. After all, who could resist the chance to get in on the ground floor of a rocket ship that's about to take off ?

To summarize it in the words of Gwen Stefani:

Let me hear you say, this sh!t is bananas
B-A-N-A-N-A-S
this sh!t is bananas
B-A-N-A-N-A-S
Again, this sh!t is bananas
B-A-N-A-N-A-S

... and so on.

Kleiner Perkins, Sequoia: you know you want in. Term sheet offers may be submitted to Bananas.About.Bananas@bananalytics.com. Special consideration will be given to offers submitted on, with, or by, a banana.

An inconvenient form of payment

There's a news story making the rounds in the local media about a local congressman's trip to Iraq supposedly being financed by Saddam Hussein. Ignoring all the non-essential fluff and diving straight to the heart of the story, here's the bit that I find the most intriguing:

"The indictment says Al-Hanooti received 2 million barrels of Iraq's oil as payment for his services"

Now, how exactly would one turn that sort of payment into cash ? It's not like you can just call up Exxon and say "Hey, I have 2 million barrels of oil, you interested ?" and not get asked a lot of potentially awkward questions [at least I assume you can't ...]. And I suspect that an offer to sell 2 million barrels of oil on Ebay, that fine disintermediating marketplace, would also lead to some scrutiny and headscratching.

Maybe this is the equivalent of being paid in Ningis, from the Hitchhiker's Guide to the Galaxy:

"The Ningi [is] a triangular rubber coin six thousand eight hundred miles along each side. It is valued at the rate of eight Ningis to one Triganic Pu, but thanks to the Ningi's immense size (almost twice as wide as the Earth's equatorial radius), it is more-or-less impossible to collect enough to own one pu."

I hope somebody does a follow-up story on how/whether Al-Hanooti spent his payment.

February 29, 2008

Sequencing gone wild

Apparently Google is investing money in an effort to sequence [sort of] 100,000 genomes. Some excerpts from the article:

Church has already partially sequenced genomes from 10 people, and the jump to 100,000 is under review by a Harvard ethics panel

Right, it's only a scale-up of 4 orders of magnitude, that should be pretty easy ;-) To put this in perspective: to the best of my knowledge, there are currently only 4 complete human genome sequences in existence. [Venter's, Watson's, the original sequence from the Human Genome project, and the original sequence from Celera].

The Harvard scientist is controlling costs by sequencing only protein-making genes, which make up about 1 percent of the genome

This, to me, seems a bit weird. It's becoming pretty clear that there's a huge amount of information embedded in parts of the genome that don't code for proteins [ie the other 99% of the genome], and that individual variations in protein-making genes aren't even close to being the whole story when it comes to determining the differences between people. In other words, it's not clear that sequencing only protein-coding genes will really tell you all that much. Then again, it's probably a reasonable place to start, given the current limitations of sequencing technology.

Ross Muken, a Deutsche Bank Securities Inc. analyst in San Francisco, said Google is ideally suited to help consumers keep track of genetic data, as new sequencing technology becomes available.

``They want to have an ability to display to the individual their genetic information in a user-friendly interface,'' he said in a telephone interview. ``Who better to do that than Google?'''

Uhm, right, because Google is so good at user interfaces. Displaying genetic data, with the multiple possible levels of detail [individual base pairs, short functional elements like promoters, long functional elements like protein-coding sequences, linked functional elements like the exons that make up a gene, chromosome arms etc] and multiple ways of annotating it, is a much tougher problem than zooming in and out of a street map. [For an example of a genome map, go look at Jim Watson's genome].

All that said, George Church is a pretty smart guy, so I'm sure this is a much better thought-out project than the Bloomberg article makes it seem.