2013-02-26

On big technical meetings, or why the end of the UDS is a bad idea

Canonical has just announced that the Ubuntu Developer Summit will no longer be face to face and every six months. Instead it will be entirely online and virtual, using Google Hangouts. (Here is the announcement.)

On the surface, this seems like a good idea: It's cheaper monetarily, it appears to open things up to people who are unable to travel, and it makes it easier to make complete records.

However, I think it's a bad idea, for several interrelated reasons.

Some decision making needs face-time to happen. For whatever reasons, internet-only communication is not enough for a good enough "meeting of the minds" for sticky or subtle engineering and design decisionmaking.

The IETF, who probably have the longest history of any organization ever of online internet-enabled collaboration, worked out long ago that while day to day collaboration can be done over email and text chat, some technical decison-making HAS to be done face to face. Thus, the IETF meets every 6 months.

Likewise, at the old MySQL AB, even though the entire company was famously completely distributed, we also figured out that despite being on email and IRC with each other every day, we had to meet ever 6 months, for face-time decision making. Thus, the whole company met every year, and then each team or group met together at least one other time over the year.

And then, most anyone who actually does a working attendance (as opposed to just helicoptering in to give an executive keynote, or being whisked off to a secured conference room to have a private upper executive meeting) at a technical conference knows, most of the ACTUAL work at a conference or at a technical design summit happens in the hallways, over dinner, in serendipitious meetings, in people introducing people to each other, and in impromptu engineering meetings.

These are the reasons that the OpenStack community meets together every 6 months, for our own design summit. The keynotes, the vendor booths with their signboards and handouts, the standard podium-and-rows-of-seats are, at best, a sideshow, from where the real work is getting done, the reason for the summit: the circles (not rows) of seats for the design summit meetings, and for the hallways, informal dinners, and social mixers, where all the individual meetings and necessary social processing happens.

I started to make a list of all the times I personally was part of such unstructured un-"planned" events at conferences that had significant impact, and the list grew too long, so I cut it from this post.

Email and IRC and etherpad are awesome tools, and I commend Ubuntu, as well as most other large  collaborative open source projects, such as OpenStack, for using them.  Likewise, Google Hangouts seem to be pretty awesome, and I'm glad that Canonical is trying them.

However, they do not replace face to face large group meetings, and cannot solve the problems that such gatherings can.

I wish Canonical and Ubuntu well, but this is a mistake that I hope does not damage them too much. /p

2013-02-13

Maven's role...

"Maven is a great tool to make sure that your expensive and critical application servers are running an independent copy of every single version of every single 2nd and 3rd party Java library ever written."

2013-01-31

Why I love the Hallway Track, or instigating a junk OpenStack cloud

I just had an experience that reminds me why I find physically going to open source conferences valuable and rewarding.

I am here at the last day of Linux.conf.au 2013 in Canberra.  Earlier today, Tim Berners-Lee delivered his keynote.  Afterwards, we all moved over to the main public hall for afternoon tea.

I happened to overhear a trio of young university students talking about the huge presence of the OpenStack project at this LCA, and expressing some misconceptions about the project.  Two of them had never even heard of OpenStack before seeing it presented here at the conference.

As one may do in the "hallway track" of conferences like this, I jumped in, and introduced myself, and gave them a better overview of what OpenStack is and what it tries to do, while handing out business cards

"You mean with this OpenStack, I can run my own cloud?"

"Yes.  You do have to supply the hardware."

"Well, our department is throwing out heaps of old PCs.  We could gather them up, haul them down to our student computer club, and install it on them..."

I encouraged this line of thought, and pointed out that having ops experience and dev experience with OpenStack is right now really good for getting a job.

THAT got their attention.

"I could get a job with HP if I do this?"

"You could get a job at lots of places.  Lots of companies are getting into OpenStack, and they are hiring."

When I left them for the next talk, they were talking about getting in touch with all the other Australian university computer student clubs, each club installing OpenStack on recovered junked PCs, and joining them all together as availability zones.

I like to hope I've instigated something fun here.  Or at least made some people's lives more interesting.

2013-01-18

Thoughts on Google, YubiCo, and "The War on Passwords"


There are a lot of articles going around the blogosphere today about Google "Declaring War on the Password", and showing picturers of a YubiKey.

While I am a fan and proponent of improved trustworthyness of authentication, especially with using 2 factor protocols like HOTP and TOTP and devices like the YubiKey, I am curious as to what all the hubub is about today.

What keeps Google Authenticator and YubiKey from easily working together right now is the fact that Google uses TOTP and YubiKey implements HOTP.  They are almost the same protocol, with one important difference.  TOTP is time based.  That's what the T stands for.  Every fixed internal (usually 30 seconds) a TOTP token generates a new password, which means that token needs to know what time it is, which means it needs a clock.  While a HOTP device like a YubiKey just needs to keep a counter, and generates a new password every time it's button is pressed.

So the Google and Yubi partnership means one of three things.

  1. Google is going to support HOTP on the Google Two Factor Login service, or
  2. Google and YubiCo have figured how to to put an extremely low power clock and extremely small battery into a new version of the YubiKey, or
  3. Google and YubiCo have written a USB device driver that speaks to the YubiKey when it's plugged in and tells it what time it is to generate the correct password (which means that driver needs to be installed on every Windows/Linux/MacOS/ChromeOS device you want to use the token on)
I look forward to seeing which one it is.  My money is on option #3, with the added guess that it will probably only be supported, at least initially, only on machines running Chrome or ChromeOS.

2012-08-04

Mark's Stories: "The carpets are so clean, we don't need janitors!"

(edit: Hello to the folks who are coming here from YCombinator Hacker News.  Feel free to comment here as well as on YCHN.  I've posted a few more details about this story at YCHN.)

At one company I worked at, one of the problems it didn't have was IT.

When someone was hired, by the time they got to their new desk, there was a computer on it with the correct image on it, their desk phone worked, their email worked, the calendaring and scheduling worked, and all necessary passwords and ACLs were configured.  The internal ethernet networks all worked, were fast, and were properly isolated from each other.  The wall ports were all correctly labeled, and there where the right kinds of wall ports in each cubical and conference room. The presentation projectors and conference room speaker phones all worked. The printers all worked, printed cleanly, were kept stocked, and were consistently named. The internet connections were fast and well managed.  Internal and external security incidents were quickly recognized and dealt with. Broken machines were immediately replaced with working and newly imaged replacements. If someone accidentally deleted a file, getting it back from backup typically took less than an hour. Software updates were announced ahead of time, and usually happened without issue.

The IT staff did not seem noticeably bitter, angry, harried, or otherwise suffering from the emotional costs traditionally endemic to that job role.  In fact, they were almost invisible in their skill and competence.

So, of course, came the day when the senior executives said "the carpets are just naturally clean all the time, we don't need all these janitors!". IT was "reorganized" into a smaller staff of younger and much less experienced (and probably cheaper) people.

Of course, it all went to shit.  New employees would go a week before they had machines, phones, passwords, and ACLs.  Printers ran out of paper, projectors ran out of lightbulbs, servers ran out of storage, networks got misconfigured, and so forth.  The total time lost and wasted across the whole company was most certainly greater than the savings of laying off the expensive and skilled IT staff.

This is not to say that the reorganized IT staff were stupid or lazy.  They worked very hard and ran themselves ragged trying to keep up with the cycle of operations, while trying to skill themselves up in their "spare time" and with a slashed training budget.

The lessons I learned from this experience speak for themselves.

What lessons that may have been learned by any of the other people involved, especially the executives who made these decisions, I cannot say.

2012-08-03

Mark's Stories: The Closed Source Drivers


I was working on a highly constrained consumer electronics device, a little "satellite device" that spoke to the main device over a CATV RF coax cable and also received commands from an IR remote control.  My code was failing in bizarre ways.  I adopted an extremely paranoid defensive programming stance, filling my code with asserts and doing paranoid cross checking of all inputs.  This didn't make the device work.  Instead it now consistently did not work, instead of inconsistently, because the cross checks and asserts would usually (but not always) trip before it would crash. It also started to run out of memory because of the all the paranoia code I had added.

I asked for the source code for the driver for the IR receiver, and for the driver for the  CATV RF digital transceiver, and for the peer code that was driving the cable digital that ran on the main device.

The driver for the CATF RF digital transceiver was handed to me the first time I asked.  And by "handed to me" I mean that I was pointed to where it was sitting in the source repo.


The business partner / hardware supplier who was supplying the IR glue and drivers, after giving me a runaround, finally just flat out refused, citing trade secrets, confidentiality, secret sauce, and similar bullshit.

So, I finally "stole" the source code with a disassembler.  And found the sources of many of my problems.  It was complete shit.  "Unexpected" input from the silicon would cause wild random pointer writes.  And random sunlight on the receiver optics would cause it.  "Expected" input of undefined remote commands wasn't much better, generating and handing back blocks of garbage with incorrect block length headers.

I ended up writing, nearly from scratch, a replacement IR receiver driver.


The peer device driver code was written by a developer in a different group in my same company.  I finally got the P4 ACLs to read it after loudly escalating, over the objections of it's developer and his group manager.  It was also complete shit.  I cannot even begin to remember everything that was wrong with it, but I not only figured out may of the sources of my own pain, I also found a significant source of crash and lockup bugs that afflicted the main device.

I was not allowed to rewrite the peer code, as it was not in my remit.  However, I was able to sneak in and check in a large number of asserts using the excuse that they were "inline documentation".


On, and the device driver for the CATF RF digital transceiver?  The source code I got for the asking, without a fight?  When I reviewed it, it was easy to understand, efficient, elegant, and as far as I could tell, bug free.


In the end, I made my part work.  It just took over two months instead of the original guesstimate of less than two weeks.  This caused a schedule slip in the release of the satellite box.  Which would have been a more serious problem, except…


Except there was also major schedule slip for the main box.  A significant reason for that slip was because the peer code that I had filled with asserts was now itself crashing with assertion failures instead of emitting garbage to crash my code.)  I was lucky that I was not more officially "blamed" for that.  The reason why I wasn't, was mainly because the people who understood what I did understood the problem, and the executives who didn't understand what the problem was were also too clueless to blame anyone, let alone me.


My lesson learned from this experience is: if someone is refusing to show the source to suspect driver code, citing trade secrets, confidentiality, secret sauce, partnership agreements, or similar excuses, it's not because they are protecting their magic.  It's because they have screwed up, and they are trying to hide it.

A second rule of thumb I have is: source control systems that have complex read ACLs, e.g. don't allow any arbitrary developer to check out and review any arbitrary source code file, are expressions of moral failure.

2012-06-13

On LinkedIn's claim of salting passwords

In the wake of the leak of their password database, LinkedIn issued a blog post: An Update On Taking Steps To Protect Our Members, wherein they claim they had an existing project to salt the passwords, and that that "transition was completed prior to news of the password theft".

That cannot be true. The transition could not have been "completed". One cannot "transition" a bashed password to a salted properly hashed password. The original plaintext password is required to generate the salted and properly hashed data.

Best case, LinkedIn was in the process of slowly migrating accounts over, like so: When a user logs in, look in the new salted hash database. If there is no entry, then use the old unsalted hash to verify the user, then compute the new salted hash, store that, then delete the old unsalted one.

Doing it this way has the disadvantage that it doesn't protect the people who haven't yet logged in, and there are a LOT of LinkedIn users who log in to update their profile only occasionally, such as when they change jobs.

It does have the "advantage" that it's invisible, allowing a slow migration "on the down low".

However, I'm not hugely impressed.