2012-08-03

Mark's Stories: The Closed Source Drivers


I was working on a highly constrained consumer electronics device, a little "satellite device" that spoke to the main device over a CATV RF coax cable and also received commands from an IR remote control.  My code was failing in bizarre ways.  I adopted an extremely paranoid defensive programming stance, filling my code with asserts and doing paranoid cross checking of all inputs.  This didn't make the device work.  Instead it now consistently did not work, instead of inconsistently, because the cross checks and asserts would usually (but not always) trip before it would crash. It also started to run out of memory because of the all the paranoia code I had added.

I asked for the source code for the driver for the IR receiver, and for the driver for the  CATV RF digital transceiver, and for the peer code that was driving the cable digital that ran on the main device.

The driver for the CATF RF digital transceiver was handed to me the first time I asked.  And by "handed to me" I mean that I was pointed to where it was sitting in the source repo.


The business partner / hardware supplier who was supplying the IR glue and drivers, after giving me a runaround, finally just flat out refused, citing trade secrets, confidentiality, secret sauce, and similar bullshit.

So, I finally "stole" the source code with a disassembler.  And found the sources of many of my problems.  It was complete shit.  "Unexpected" input from the silicon would cause wild random pointer writes.  And random sunlight on the receiver optics would cause it.  "Expected" input of undefined remote commands wasn't much better, generating and handing back blocks of garbage with incorrect block length headers.

I ended up writing, nearly from scratch, a replacement IR receiver driver.


The peer device driver code was written by a developer in a different group in my same company.  I finally got the P4 ACLs to read it after loudly escalating, over the objections of it's developer and his group manager.  It was also complete shit.  I cannot even begin to remember everything that was wrong with it, but I not only figured out may of the sources of my own pain, I also found a significant source of crash and lockup bugs that afflicted the main device.

I was not allowed to rewrite the peer code, as it was not in my remit.  However, I was able to sneak in and check in a large number of asserts using the excuse that they were "inline documentation".


On, and the device driver for the CATF RF digital transceiver?  The source code I got for the asking, without a fight?  When I reviewed it, it was easy to understand, efficient, elegant, and as far as I could tell, bug free.


In the end, I made my part work.  It just took over two months instead of the original guesstimate of less than two weeks.  This caused a schedule slip in the release of the satellite box.  Which would have been a more serious problem, except…


Except there was also major schedule slip for the main box.  A significant reason for that slip was because the peer code that I had filled with asserts was now itself crashing with assertion failures instead of emitting garbage to crash my code.)  I was lucky that I was not more officially "blamed" for that.  The reason why I wasn't, was mainly because the people who understood what I did understood the problem, and the executives who didn't understand what the problem was were also too clueless to blame anyone, let alone me.


My lesson learned from this experience is: if someone is refusing to show the source to suspect driver code, citing trade secrets, confidentiality, secret sauce, partnership agreements, or similar excuses, it's not because they are protecting their magic.  It's because they have screwed up, and they are trying to hide it.

A second rule of thumb I have is: source control systems that have complex read ACLs, e.g. don't allow any arbitrary developer to check out and review any arbitrary source code file, are expressions of moral failure.

2 comments:

  1. Totally agreed with your first lesson. Sunlight really is the best disinfectant.

    Mostly agreed on your second rule of thumb... in fact, I'm hard pressed to explain why it would ever be a good idea to restrict read access, except when working on some kind of government-sponsored or military project where compartmentalization is considered important.

    ReplyDelete
  2. So THAT'S why NVIDIA will not open up their drivers...

    ReplyDelete