Machines Plus Minds: 2010-02

2010-02-26

The death of Memcached is greatly exaggerated

There are many reactions going around to MySQL and Memcached: End of an Era, especially to the statement "it's clear the MySQL+memcached era is passing".

I think that really depends on what you mean by "era" and "passing".

The era of memcached being THE cutting edge technique for getting speed at scale may be "ending", but not because memcached is failing, but because there are additional (not replacement, additional) techniques now emerging.

When I was doing MySQL Professional Services, back when I started, most of my gigs had just barely heard of "sharding MySQL plus memcached". By the time I left that world, everyone had heard of it, many were doing it, and some very few were doing it well. Since then, we've all reached the point where nearly everyone who NEEDS it is doing it. And still the load and pressure rises on running systems, and so we are looking for more tools. And more tools are being developed. That need, that development, that excitement, is what "NoSQL" is all about.

But that won't be the end of memcached. The technique of the high-performance key-value store is just to useful of a building block, both on it's own, and as a sub-component of other technology components, to just throw out.

I'm sure that memcache will continue to evolve. There will be more implementations, there will be limitations removed, there will be more management tools, there will be other systems that add the memcached network protocol, there will be ORMs and other frameworks that will build in the assumption that memcached is available, there will be features to the protocol and implementations for shared hosting and cloud environments.

And even ignoring all that, there are still a myriad of internet and intranet systems running that really need what memcached can give them, and bringing it to them all is going to be a long and interesting task.

Memcached is going to stick around.

(This was originally posted at the Gear6 corporate blog. Please comment there.)

2010-02-23

"How do I add more memcached capacity without an outage?"

Once someone starts using memcached, they tend to quickly find themselves in the state of: "my database servers overload and my site goes down if the memcached stops working". This isn't really surprising, quite often memcached was thrown into the stack because the database servers are melting under the load of the growing site.

But then they face an issue that is, as mathematicians and programmers like to call it, "interesting".

"How do I add more capacity without an outage?"

At first most people just live with having that outage. Most systems have regularly scheduled downtimes, and during that them the memcached clusters can be shut down, more storage nodes are added, and then it is all restarted, with the new distributed hash values for the new number of nodes.

Ironically, the more successful the site is, the more it grows, the more costly that outage becomes. And not linear to that growth either. The increasing cost is more on the order of the square of the growth, until they literally cannot afford it at all. As the cache gets bigger, it takes longer for it to rewarm, from minutes to hours to days. And as your userbase grows, the more people there are to suffer the poor experience of the cache warming up. The product of those two values is the "cost" of the outage. This is bad for user satisfaction, and thus is bad for retention, conversion, and thus revenue.

This can be especially frustrating in a cloud environment. In a physical datacenter, because you have to actually buy, configure, and install the hardware for a node, it somehow feels easier to justify needing an outage to add it to the cluster. But in a cloud, you can start a new node with the click of a button, without getting a purchase approval and filing a change plan. And also, in cloud environments, we all have been evangelizing "dyanamic growth", and "loosely coupled components", and "design for 100% uptime in the face of change". And yet here is this vary basic component, the HDT KVS cluster, that doesn't want to easily work that way.

There are ways to resize a DHT cluster while it is live, but doing so is an intricate and brittle operation, and requires the co-operation of all the clients, and there are no really good useful open source tools that make it easier. You have to develop your own custom bespoke operational processes and tools to do it, and they are likely to miss various surprising edge cases that get learned only by painful experience and very careful analysis. Which means that the first couple of times you try to resize your memcached cluster without a scheduled outage, you will probably have an unscheduled outage instead. Ouch.

One commonly proposed variety of solution is to make the memcached cluster nodes themselves more aware of each other and of the distributed hash table. Then you can add a new node (or remove a failed one), and the other nodes will tell each other about the change, and they all work together to recompute the new DHT, flow items back and forth to each other, put items into the new node, and try to keep this all more or less transparent to the memcached clients with some handwaving magic of proxying for each other.

And that, more or less, is what Gear6 has just done, under the name "Dynamic Services". We have released it first in our Cloud Cache distribution, initially on Amazon AWS EC2, and then on other cloud infrastructure systems. Soon next it will be in our software and appliance distributions.

This is an especially useful and neat in a cloud environment because the very act of requisitioning and starting a new node is something that the underlying infrastructure can provide. So you can go to the Gear6 Cloud Cache Web UI, and ask it to expand the the memcached cluster. That management system will interface to the EC2 API, and spin up more Gear6 memcached AMIs, and once they are running, add them to the cluster and then rehash the DHT. All while the cluster is serving live data.

(This entry was originally posted at my Gear6 Corporate Blog. Please comment there.)

2010-02-19

Things I wish Buzz and other social networking systems did when someone starts following me

Google Buzz links up with Google Reader, more or less. I use Google Reader twice a day. The main effect that Buzz has on my experience is that each morning there are a handful of people in the "people have started sharing with you" item. Which I then open, and do not recognize them, and so close the tab back without following them back.

Just displaying a name or handle and maybe a cute picon is not sufficient. Some people have this wonderful memory for names and faces, but some of us are not so much.

When J Random Person starts following me, Buzz & Reader should display to me some more context about that person. For example, that person could have 140 characters of self description, plus 140 characters of "why I want to follow you". With no URLs or any "rich content" allowed. The reason to keep it short and simple is to reduce the exposure surface for spamming.

Most instant message systems have the provision of sending a "Why you should add me to your roster" when making a connection request. Facebook does this as well.

And something that would be amazingly useful is, if I am following anyone who is following that person, show that to me. If someone I interested enough in to follow already thinks that someone is interesting enough to follow, I am more likely to decide to as well.

Most instant message systems have the provision of sending a "Why you should add me to your roster" when making a connection request. Facebook does this as well.

I am more likely to approve and/or follow someone back after a connection request if I am shown some additional context, especially how they already are linked into my extended social graph. I don't want to have to take the time to do various graph network search tools to figure it out.

Just displaying a name or handle and maybe a cute picon is not sufficient. Some people have this wonderful memory for names and faces, but some of us are not so much.

2010-02-18

Idea: parallax, 3D streaming video compression, and webcams

If your computer has two or more webcams pointing at you from opposite corners of the display, then it can do parallax calculations, just like your eyes and brain do, and by doing so, construct a decently good depth map 3D model of what it was seeing. Most of the time what it would see would be your face in the near area, and whatever wall was behind you in the far area.

For sending streaming video for video conferencing, the simple and dumb thing to do would be to send a complete video stream from each camera. This is dumb and wasteful. The way that video compression works is that it looks for similarities between frames over time, and then sends a stream of transformations and differences. An obvious technique for sending parallax video streams is for the video compressor to look at the multiple simultaneous frames, as well as over time, and send transformations and differences over space as well as over time. I do not know if any of the MPEG or other video stream compression standards can do this, but it will soon be necessary, especially now that the video entertainment industry is seriously talking again about 3D.

I'm sure there are a whole slew of obvious patents all claiming assorted variations of this simple idea

A particularly simple and stupid way to do it would be to just feed the alternating frames, left right left right, into the video compressor, and pretend that flicking back and forth in space is the same as motion through time. While I'm sure that some useful level of compression would result, it would be extremely, shall we say, non-optimal.

Anyway, the math for doing the parallax calculation and the math for doing the video compression transformation and differences would be very similar. A smart implementation could be spitting out a rough Z depth map as well as the compressed data stream. In fact, it would make sense for the Z depth map to be sent as part of the compressed video data stream.

Doing all this with more than two cameras makes calculating the depth map even easier, plus each camera can have cheaper lower resolution sensors, which can be mathematically combined into a much higher resolution view.

Another useful trick for the webcam case would be to use the depth map to distinguish between the stuff in the near field (which probably is the face of a talking head) and the random background clutter, and either send the background video data at a much lower resolution, or maybe not even send it at all. This would save bandwidth, increase privacy, and improve the user experience. After all, when you are on a video chat call with someone, you usually dont care at all what is behind them. It carries no useful signal for the conversation.

This same trick could be doable even without two cameras and parallax. Facial recognition software is good enough that it could draw a good enough bounding curve around your face on a frame by frame basis, before sending the raw frames into the compressor. This might even be a CPU time win, because the compressor doesnt have to spend any time at looking at and compressing the background image. (After talking to a friend, and doing some google searches, it turns this sort of thing is already an available feature in off-the-shelf consumer grade webcams.)

Mixing together the idea of the depth map and facial recognition software has many other interesting implications.

Faces all have the same basic depth map. The webcam video stream could basically say "this is a face. here are the transformations that turn a generic face into THIS face. here are the stream of transformations over time that are THIS face changing over time". Then interleaved with that, is a color image map of your "skinned" face, that the receiving side can then "wrap" over the 3D depth map. And it can do such tricks as prioritize and do in higher resolution (in both space and time) the face's eyes, lips, and jaw, and maybe even do some work to sync the mouth motions with the audio data.

This makes other neat tricks possible as well. By distorting the face model and the unwrapped color image in various ways, you could make yourself look thinner, fatter, a different gender, a different race, a fictional face (elf, dwarf, na'avi, etc) or even look like some other specific person. This would be useful for amusement, enertainment, deception, privacy, and corporate branding.

Imagine calling up a corporate customer service rep on a video call, and no matter who you actually talk to, they all look like the same carefully branded innoffensive person. Or even possibly based on your customer record and call data geolocation data, the CSR face looks like the race/ethnicity of your local area.

Another useful shortcut would be that once someone's specific face specification is generated, it can be given a specific unique id. At some time in the future, in another video call, just the ID could be sent, and if the receiver has seen it before, the full face definition does not have to be sent. This would work for more things than faces. Whenever ANY definable object is seen and characterized, it gets an ID, and is opportunistically cached by the receiver.

All this also can work with the other implications of linking facial recognition with augumented reality and lifestreaming that I've mused about in a previous post.

2010-02-10

Thoughts on XMPP, Facebook, and AIM

Facebook has announced that they have set up an XMPP/Jabber gateway. This is great news. I hope it is soon followed up with with that server implementing TLS (for security and privacy) and S2S (so that it federates with Google Talk).

In a probably related announcement, Facebook and AOL/AIM have announced some sort of "partnership" "that will integrate a user's Facebook friends into their AOL Instant Messager". What I suspect that means is that AIM is simply running an XMPP client in the AIM server software, and is using "Facebook Connect authentication (X-FACEBOOK-PLATFORM)" as described here.

This is not nearly as exciting or useful as the Facebook/XMPP thing. It works in the wrong direction to be useful to me. I would still have to run a local AIM client. I'm already running an XMPP client, and now I can use that to chat with Facebook users.

What AOL/AIM should do is do what Facebook just did. Set up their own XMPP server and gateway. Then they don't need their "partnerships" with Facebook or with Google Talk. Just turn on S2S peering, and tom@gmail.com, dick@facebook.com, and harry@aol.com can Jabber away to each other to their heart's content. Harry wouldn't even need to stop using his old AIM binary, while Tom is using his Android phone, and Dick is puttering around on the Facebook web page.

2010-02-08

The right way to do location+social

I'm tired of geolocation "apps". And of geo location "sites", and especially of geo-location based startups and 99% of location based business plans. There is little to no need for any of it.

Not to say that location isn't useful and transformative, especially when mixed with "social". But stuff like dodgeball / foursquare / brightkite / latitude are braindead ways of doing it.

Here is how it should work:

Location data goes into your XMPP status, right next to your "away" status, using XMPP XEP-0080. It can be kept up to date with either your preferred IM program, and/or with a specialized one, running on your smartphone/MID (iPhone, iPad, Android, Netbook, Laptop, etc) Location data also can go into Twitter / Status.net / etc updates.

You would then add a "location social" service, such as Foursquare or FireEagle or Latitude, to your IM roster. That service can then IM back to you when a friend of yours is someplace interesting, or a friend of a friend is near you, or tell an advertiser or business that someone with your marketing profile is at some venue, or whatever. If you want to do a specific "checkin", that would be also coded into your XMPP status, or you can send a "checkin" IM message to the location social service.

It scales, it federates, it monetizes, it mashups, it allows consumer choice, it is easy for both programmers and users, it invites trival uptake, it enables many more things to be more easily built on top of it. It requires no to little software to be installed or used by the end users.

It drives me nuts that the Android IM app doesnt implement XEP-0080, and that it appears that Google Talk mangles it. Talk about missing an opportunity!

It drives me nuts that all the various Twitter smartphone clients do not implement the Twitter Geo-Location API. Hint, putting a shortened URL pointing into Google Maps is NOT "geo-location".

2010-02-05

I wish AWS would: Use two-legged OAuth

The Amazon Web Services cloud HTTP API does not use HTTP Basic Auth, or HTTP Digest Auth. Instead it uses it's own proprietary but documented authentication prototol, which not only security identifies the account credentials of the requesting user, but also protects and authenticates the HTTP request, various important headers, and also the the message body from corruption and tampering.

It was good and wise for Amazon to do this, because when they first deployed AWS, there was no simple straightfoward open protocol that did this.

But now there is.

It's based on the OAuth protocol, and is called "Two Legged OAuth" or sometimes "Signed Fetch", and there are many open source libraries in many languages in many web client frameworks that implement it.

I wish that AWS would deprecate their existing idenfication/authentication protocol, and allow HTTP clients to use Two Legged OAuth to access the AWS APIs.

Idea: Computers are going to be able to read emotions

Despite what old SF movies claim, computers are going to be able to see and understand human emotional affects just fine.

By doing gait analysis, watching for facial blood flow using IR cameras, using high frame rate video with very advanced facial recognition software that is looking for micro expressions, eye motion, and pupil changes, and by analyzing vocalizations for stress, it's probable that eventually the machines are going to be better at recognizing a human's emotional state than most humans can.

And stuff like spoken sarcasm and irony, will be transparently obvious.

And, of course, that the machine can see this, means that it can be serialized, recorded, and transmitted. It would be just as much part of a transcription of a conversation as the spoken words are.

By no means will this be a magic "lie detector", but it will probably be a bit better than a trained human's skill at it.

Idea: IRC proxy to microblog protocol

It shouldn't be too hard to write an IRC server that is just a proxy and protocol translator to a real time microblog system, such as Twitter or Status.net or XMPP XEP-0277.

Using an IRC client, you would connect to this special IRC server, and login with your Twitter credentials, and be immediately joined to a special chat room. Every person who you are following or who are following you would appear to be in that room. The people you are following would have a +v flag. If someone you follow makes a public tweet, it appears as a message to that room. If you IRC private message someone, it is translated into a Twitter direct message. If someone Twitter direct messages you, it gets translated into an IRC private message to you.

If you ask for IRC info on someone, it will be taken from the Twitter user info.

If you IRC join a room named #foobar, you will join that room, but will not be voiced. That room will receive the Twitter realtime search results for the word "foobar" and for the hashtag "#foobar".

I'm not quite sure how to translate into the IRC-verse @ matches on your username. Probably also some sort of direct message.

Idea: Processing a Lifestream Video to Recognize and Remember Faces

The previous idea, Processing a Lifestream Video to Extract Signs & Art, gets even more interesting when you consider facial recognition in machine vision.

If the processing software finds anything that looks like a face in any frame, it can correlate it forward and backwards through time, in the succeeding and preceding frames, for as long as that given face is visible. Now it has a lot of views of the same face closely associated in time, at multiple angles. From that, a pretty good 3D model of the face can be built, along with a pretty high resolution texture map of it.

From that, it would be much easier to recognize a specific face from a database. Especially if it could cheat, and start with a database of the lifestream owner's address book. Or a public database of faces of people who live in the area. And even more interestingly, if it used a database of all the faces picked out of the past processed chunks of video lifestream recorded by that user!

So if you ever meet or encounter or get to know anyone, you could query your lifestream agent, and find out all the times you have ever been in eyeshot of that person before.

It will get especially interesting when this can be done in near realtime, instead of batch processing it each night.

Of course, this kind of processing can be done on the recordings made by public security cameras. In fact, it will be easier done with that data, because all the data from all the cameras in a city or region can be correlated, and the cameras are at known and fixed locations. It will probably be done first with those cameras by governments (this is probably already either in trials, or is being pitched as a concept by various technology security contracts to the TSA and it's ilk), and then later done by individuals on their own personal lifestreams.

It won't be too many more years, and you will just live with the fact that every time you ever go out in public, everyone everywhere will be able to "recognize" you, and "remember" you.

This is inevitable.

Idea : Processing a Lifestream Video to extract Signs & Art

I was wandering around an art showing this evening, the monthly Seattle First Thursday when an idea hit me.

If i was wearing a little lifestreaming camera, and I post-processed the video data, it would be pretty easy to find all the rectangle-ish things it saw. For each one, it could correlate them across them between frames, transform into parallelograms, and composited together all the low resolution images into a single much higher resolution image, I would end up with a digital gallery of every rectangular sign, poster, painting, photograph, door, wall, and window that was in that video stream.

It would be best if the video was just a sequence of DNGs, or an MJPEG stream, but this would also work for MPEG type video as well.

It gets even better if you get access to lots of people's video lifestreams when they are at the same venue. It wouldn't be too hard at all to correlate them all together.

Making it work for non-rectangles, and building a gallery of every distinct object and image in the lifestream is just a matter of more CPU and slightly smarter software.

This is going to happen. It's going to be ubiquitous. The hardware already exists, and the software I describe is, at best, a thesis for a master's degree in CS.

2010-02-04

Idea: Digital Cameras need a 4th color, UV

If digital cameras also recorded a 4th color, ultraviolet, a couple of useful things would be possible.

If the video stream was going into machine vision, the processing algorithms would be more able to see useful differences and edges and kinds of surfaces.

For images that are meant for showing to humans, taking the UV channel and laying it down over the human visible colors as a sort of greymask would remove sun glare.

Removing sun glare would also be useful for machine vision that is used as an input for self driving cars.

For artistic images, having access to the UV channel and pushing it into visible would make artistic photographs of plants, especially flowers, much more interesting. Many flowers have features and patterns that are visible only in UV.

Adding the UV channel into the visible channels in some sort of false color in an augmented reality HUD system, would give high tech soldiers the ability to see and see through lower tech camo.

Idea: Digital Video with multiple offset flashes

A digital video camera, with 3 or more flashes in a ring around the lens, could do some interesting and useful things.

If you shot a burst of images, one for each flash, each flash going off in turn, and then post processed and composited the images together, edge detection would be trival, even for items that are the same color.

If the sensors were greyscale, and you just created a single image that was darker in places where the N images were different, and lighter were they were the same, you would end up with a very detailed and correct line drawing of whatever the pictures was of.

If you shot continuously, as a video stream, while moving the camera around some object, and then post processed the video stream, either offline or realtime, you could generate a very good 3D model of whatever it was looking at, with good texture maps of the surfaces of the object.

For machine vision doing facial recognition, the it will be easier to both recognize faces in general, and also to determine who's specific face is in frame, especially if it's a video stream, which would allow the software put together a higher resolution composite image out of the lower resolution video frames, and to build a 3D model of the face, as the person moves relative to the lens.

This can be built with existing digital camera components today. It doesn't even need a really high megapixel sensor.

"Innovators" are not the same as "Patent Holders"

In this blog post at TalkStandards, Mr Ganslandt says:

China recently circulated a draft regulation regarding the use of patents in Chinese national standards. The regulation demands that for patents to be eligible for incorporation in standards, they must be made irrevocably available royalty free or for a nominal fee. ... The negative impact on innovators could be severe.

And I responded thus:

You confuse “innovators” with “patent holders”. I do not know if that is just a linguist tic from your field, or an intentional confusion.
Patents are almost never created or used to protect innovation. Companies that actually innovate and make things do so because they have a culture of innovation and execution, and they hold patents only to ante into the game of Mexican Standoff against other holders of patents.
The only people harmed by this proposal are patent trolls, and by companies who think that spending effort lobbying their patents into various national and international standards (especially doing so surreptitiously), and then riding the gravy train, instead of actually spending money on actual research, development, innovation, and execution.
Somehow, I don’t feel that patent trolls and patent surprisers deserve any protection or sympathy, and I hope that proposals like this one actually smash their business “model” as the parasite it is.

Machines Plus Minds