2012-08-04

Mark's Stories: "The carpets are so clean, we don't need janitors!"

(edit: Hello to the folks who are coming here from YCombinator Hacker News.  Feel free to comment here as well as on YCHN.  I've posted a few more details about this story at YCHN.)

At one company I worked at, one of the problems it didn't have was IT.

When someone was hired, by the time they got to their new desk, there was a computer on it with the correct image on it, their desk phone worked, their email worked, the calendaring and scheduling worked, and all necessary passwords and ACLs were configured.  The internal ethernet networks all worked, were fast, and were properly isolated from each other.  The wall ports were all correctly labeled, and there where the right kinds of wall ports in each cubical and conference room. The presentation projectors and conference room speaker phones all worked. The printers all worked, printed cleanly, were kept stocked, and were consistently named. The internet connections were fast and well managed.  Internal and external security incidents were quickly recognized and dealt with. Broken machines were immediately replaced with working and newly imaged replacements. If someone accidentally deleted a file, getting it back from backup typically took less than an hour. Software updates were announced ahead of time, and usually happened without issue.

The IT staff did not seem noticeably bitter, angry, harried, or otherwise suffering from the emotional costs traditionally endemic to that job role.  In fact, they were almost invisible in their skill and competence.

So, of course, came the day when the senior executives said "the carpets are just naturally clean all the time, we don't need all these janitors!". IT was "reorganized" into a smaller staff of younger and much less experienced (and probably cheaper) people.

Of course, it all went to shit.  New employees would go a week before they had machines, phones, passwords, and ACLs.  Printers ran out of paper, projectors ran out of lightbulbs, servers ran out of storage, networks got misconfigured, and so forth.  The total time lost and wasted across the whole company was most certainly greater than the savings of laying off the expensive and skilled IT staff.

This is not to say that the reorganized IT staff were stupid or lazy.  They worked very hard and ran themselves ragged trying to keep up with the cycle of operations, while trying to skill themselves up in their "spare time" and with a slashed training budget.

The lessons I learned from this experience speak for themselves.

What lessons that may have been learned by any of the other people involved, especially the executives who made these decisions, I cannot say.

2012-08-03

Mark's Stories: The Closed Source Drivers


I was working on a highly constrained consumer electronics device, a little "satellite device" that spoke to the main device over a CATV RF coax cable and also received commands from an IR remote control.  My code was failing in bizarre ways.  I adopted an extremely paranoid defensive programming stance, filling my code with asserts and doing paranoid cross checking of all inputs.  This didn't make the device work.  Instead it now consistently did not work, instead of inconsistently, because the cross checks and asserts would usually (but not always) trip before it would crash. It also started to run out of memory because of the all the paranoia code I had added.

I asked for the source code for the driver for the IR receiver, and for the driver for the  CATV RF digital transceiver, and for the peer code that was driving the cable digital that ran on the main device.

The driver for the CATF RF digital transceiver was handed to me the first time I asked.  And by "handed to me" I mean that I was pointed to where it was sitting in the source repo.


The business partner / hardware supplier who was supplying the IR glue and drivers, after giving me a runaround, finally just flat out refused, citing trade secrets, confidentiality, secret sauce, and similar bullshit.

So, I finally "stole" the source code with a disassembler.  And found the sources of many of my problems.  It was complete shit.  "Unexpected" input from the silicon would cause wild random pointer writes.  And random sunlight on the receiver optics would cause it.  "Expected" input of undefined remote commands wasn't much better, generating and handing back blocks of garbage with incorrect block length headers.

I ended up writing, nearly from scratch, a replacement IR receiver driver.


The peer device driver code was written by a developer in a different group in my same company.  I finally got the P4 ACLs to read it after loudly escalating, over the objections of it's developer and his group manager.  It was also complete shit.  I cannot even begin to remember everything that was wrong with it, but I not only figured out may of the sources of my own pain, I also found a significant source of crash and lockup bugs that afflicted the main device.

I was not allowed to rewrite the peer code, as it was not in my remit.  However, I was able to sneak in and check in a large number of asserts using the excuse that they were "inline documentation".


On, and the device driver for the CATF RF digital transceiver?  The source code I got for the asking, without a fight?  When I reviewed it, it was easy to understand, efficient, elegant, and as far as I could tell, bug free.


In the end, I made my part work.  It just took over two months instead of the original guesstimate of less than two weeks.  This caused a schedule slip in the release of the satellite box.  Which would have been a more serious problem, except…


Except there was also major schedule slip for the main box.  A significant reason for that slip was because the peer code that I had filled with asserts was now itself crashing with assertion failures instead of emitting garbage to crash my code.)  I was lucky that I was not more officially "blamed" for that.  The reason why I wasn't, was mainly because the people who understood what I did understood the problem, and the executives who didn't understand what the problem was were also too clueless to blame anyone, let alone me.


My lesson learned from this experience is: if someone is refusing to show the source to suspect driver code, citing trade secrets, confidentiality, secret sauce, partnership agreements, or similar excuses, it's not because they are protecting their magic.  It's because they have screwed up, and they are trying to hide it.

A second rule of thumb I have is: source control systems that have complex read ACLs, e.g. don't allow any arbitrary developer to check out and review any arbitrary source code file, are expressions of moral failure.

2012-06-13

On LinkedIn's claim of salting passwords

In the wake of the leak of their password database, LinkedIn issued a blog post: An Update On Taking Steps To Protect Our Members, wherein they claim they had an existing project to salt the passwords, and that that "transition was completed prior to news of the password theft".

That cannot be true. The transition could not have been "completed". One cannot "transition" a bashed password to a salted properly hashed password. The original plaintext password is required to generate the salted and properly hashed data.

Best case, LinkedIn was in the process of slowly migrating accounts over, like so: When a user logs in, look in the new salted hash database. If there is no entry, then use the old unsalted hash to verify the user, then compute the new salted hash, store that, then delete the old unsalted one.

Doing it this way has the disadvantage that it doesn't protect the people who haven't yet logged in, and there are a LOT of LinkedIn users who log in to update their profile only occasionally, such as when they change jobs.

It does have the "advantage" that it's invisible, allowing a slow migration "on the down low".

However, I'm not hugely impressed.

2012-05-16

Go get your whooping cough shot

I just went on a half-hour binge reading about pertussis, as a result of reading that WA state has an epidemic outbreak that the public health agencies are struggling to stop.
Reading people's first hand accounts of having it will cross your eyes in empathized pain. One person described that she has given birth, has suffered a compound fracture of a leg, has passed a kidney stone, and has had whooping cough. Only one of them made her wish for death. It's been described as feeling like an asthma attack while someone is punching you in the ribs. Oh, and most all cough medicines do pretty much nothing for it.
You, reading this. You. Right now, pick up your phone, and call your doctor's clinic. Ask them if you're up to date, and if you're not, go get your freaking DTaP shot. You can get one at Walgreen's for less than the cost of a high-end Starbucks drink. You owe it to yourself, you owe it to public health, you owe it to your friends and coworkers, and you owe it to every pregnant woman, every newborn, and every immunocompromised person you share this biosphere with.

2012-03-07

Is there a "C for JVM"?

The accusation that C is "structured assembly language" has merit. It is not hard for a competent developer to map C to CISC assembly, running the compiler in his head.  This is not a weakness, it is a strength, and is the reason why it remains a tool of choice when performance and size count.

As near as I can easily see, nobody has designed an equivalent language for the JVM.  Something that maps the language constructs closely and clearly to the underlying semantics of the actual procedural execution model.  With no "object code blowups", where doing some small change to the source causes a massive inflation of object code side.

Does such a language exist?

(BTW, the Java language itself is not it.)

Any random CS masters student can design a language that abstracts away an underlying reality.   A few rare and gifted language designers can do it well. But it takes a rare genius to embrace and clarify an underlying reality, and make it approachable and useful.

2012-01-12

On my way to LCA2012


I'm starting the first leg of a literal `round-the-world business trip.

I'm flying from my home in Seattle, via LAX, to Melbourne Australia.  There I will meet up with my friend Stewart Smith, the Director of Engineering at Percona. He is fellow survivor of MySQL & Sun, and a fellow contributor to Drizzle.

Other good friends of mine who are converging on Australia for LCA 2012 are Sarah Novotny, Monty Tayler, and Jacob Applebaum.

Why am I going to Australia?  Geeks into open source who are "in the know" know the Linux.conf.au conference to be one of the best open source conferences in the world.  This year it will be held in Ballarat, which is not far from Melbourne.  This will be my 4th LCA, having attended past ones in Brisbane, Wellington, and Tazmania

At that conference, I will be speaking in the SysAdmin Miniconf, to demo OpenShift, Red Hat's cloud PaaS.  (Sign up with the promo code LCA2012.)


This is only the first leg of this trip.  After LCA, I will be heading to Bangalore India to present at JUDCon:India.

And after that, I will keep heading west to Europe, to present at FOSDEM in Brussels.

And then who knows where I will go next?


2012-01-03

Thoughts on travelling and OkCupid matching


I've had an OkCupid profile about 10 years, and have answered over 1200 questions in their database.

When I travel, I sometimes reset my location in OkC to be the city I'm in.  It makes for an interesting view into the place that I'm at.

When I'm in Seattle, there are surprisingly large number of people who match me at 95% or above.   When I am traveling, there are relatively few people who match that closely.  Often, there are none.  Even in very densely populated areas with an order of magnitude more people to match against.

I find that very interesting.

Did I end up living in Seattle because it's friendly to "people like me"?  Or is it the other way around, and being around Seattle has made me more like the people here I match so closely with?

And then there are even more interesting questions one could ask and answer with the OkC database.  Does this kind of "lots of close matches" effect happen in every city?  How much dispersion is there in a given city?  After all, even if there are many people I match closely with, there are a LOT more in Seattle that I don't match at all.  How do other cities compare that way?

Hopefully the data geeks inside OkC will someday answer these questions.